web-content-scraper
CommunityScrape clean web content with image attribution.
Software Engineering#markdown#web scraping#content extraction#playwright#web-content#image attribution
Authorsekka1
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill enables reliable extraction of the main article content from web pages while filtering noise like ads, headers, footers, and navigation. It also downloads relevant images and preserves their source URLs for copyright attribution, delivering clean, markdown-ready content for AI contexts.
Core Features & Use Cases
- Main content extraction: Retrieve the primary article or page content and convert it to markdown.
- Image attribution: Download images with alt text and preserve source attribution metadata.
- Robust to site variations: Works across blogs, documentation pages, and care guides by targeting common content regions and removing boilerplate.
- Use Case: Feed collected web content into your moss wall knowledge base to answer questions with both text and referenced images.
Quick Start
- Provide a URL to scrape (e.g., https://example.com/article) and return the cleaned main content as markdown, including image captions and attribution URLs.
Dependency Matrix
Required Modules
playwright
Components
scripts
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: web-content-scraper Download link: https://github.com/sekka1/mosswall/archive/main.zip#web-content-scraper Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.