web-scraping
CommunityExtract web content reliably.
AuthorAlexAlvarezAlmendros
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill tackles the challenge of extracting valuable content from websites, even when faced with anti-bot measures, paywalls, or dynamic content loading.
Core Features & Use Cases
- Multi-Strategy Scraping: Employs a cascade of methods (requests, Trafilatura, Playwright) with automatic fallbacks for robust data retrieval.
- Anti-Bot Bypass: Utilizes Playwright with stealth mode and rotating user agents to circumvent detection.
- Content Extraction: Extracts main article content, titles, and can handle JavaScript-rendered pages.
- Poison Pill Detection: Identifies and flags paywalls, captchas, and rate limits.
- Social Media Scraping: Includes patterns for YouTube (metadata, video/audio download, transcripts) and Instagram (post data, media download) using
yt-dlpandinstaloader. - Undocumented API Discovery: Provides methods for reverse-engineering and utilizing hidden APIs.
- Use Case: Scrape product details from an e-commerce site that heavily relies on JavaScript, ensuring you get all product information despite anti-scraping measures.
Quick Start
Use the web-scraping skill to extract the main content and title from the URL 'https://example.com'.
Dependency Matrix
Required Modules
requeststrafilaturaplaywrightyt-dlpinstaloaderfake_useragent
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: web-scraping Download link: https://github.com/AlexAlvarezAlmendros/HomeScrapper/archive/main.zip#web-scraping Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.