web-scraping

Community

Extract web content reliably.

AuthorAlexAlvarezAlmendros
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill tackles the challenge of extracting valuable content from websites, even when faced with anti-bot measures, paywalls, or dynamic content loading.

Core Features & Use Cases

  • Multi-Strategy Scraping: Employs a cascade of methods (requests, Trafilatura, Playwright) with automatic fallbacks for robust data retrieval.
  • Anti-Bot Bypass: Utilizes Playwright with stealth mode and rotating user agents to circumvent detection.
  • Content Extraction: Extracts main article content, titles, and can handle JavaScript-rendered pages.
  • Poison Pill Detection: Identifies and flags paywalls, captchas, and rate limits.
  • Social Media Scraping: Includes patterns for YouTube (metadata, video/audio download, transcripts) and Instagram (post data, media download) using yt-dlp and instaloader.
  • Undocumented API Discovery: Provides methods for reverse-engineering and utilizing hidden APIs.
  • Use Case: Scrape product details from an e-commerce site that heavily relies on JavaScript, ensuring you get all product information despite anti-scraping measures.

Quick Start

Use the web-scraping skill to extract the main content and title from the URL 'https://example.com'.

Dependency Matrix

Required Modules

requeststrafilaturaplaywrightyt-dlpinstaloaderfake_useragent

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: web-scraping
Download link: https://github.com/AlexAlvarezAlmendros/HomeScrapper/archive/main.zip#web-scraping

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.