content-ingestion
CommunityRapidly map and fetch web content for your AI.
Data & Analytics#automation#data collection#web scraping#sitemap#trafilatura#content ingestion#web content
Authorboringdata
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Gathering web content for AI analysis can be slow and inefficient, especially with large websites. This skill streamlines the process with a "map-then-fetch" workflow, allowing selective, fast, and parallel content acquisition.
Core Features & Use Cases
- Sitemap Mapping: Quickly discover all URLs from a website's sitemap without downloading content.
- Selective Fetching: Review discovered URLs and fetch only the content you need, individually or in parallel batches.
- Date Discovery: Automatically extract publish dates from blogrolls and changelogs for freshness tracking.
- Use Case: You need to analyze all blog posts from a competitor's website. This skill lets you map their sitemap, filter for blog URLs, and then fetch hundreds of posts in minutes, complete with publish dates, ready for analysis.
Quick Start
Map all URLs from https://www.example.com and then fetch all documents that contain "/blog/".
Dependency Matrix
Required Modules
trafilatura
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: content-ingestion Download link: https://github.com/boringdata/kurt-demo/archive/main.zip#content-ingestion Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.