web-scraping

Name: web-scraping
Availability: InStock
Author: AlexAlvarezAlmendros

Community

Extract web content reliably.

Software Engineering #data extraction #api #web scraping #yt-dlp #playwright #anti-bot #instaloader

AuthorAlexAlvarezAlmendros

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill tackles the challenge of extracting valuable content from websites, even when faced with anti-bot measures, paywalls, or dynamic content loading.

Core Features & Use Cases

Multi-Strategy Scraping: Employs a cascade of methods (requests, Trafilatura, Playwright) with automatic fallbacks for robust data retrieval.
Anti-Bot Bypass: Utilizes Playwright with stealth mode and rotating user agents to circumvent detection.
Content Extraction: Extracts main article content, titles, and can handle JavaScript-rendered pages.
Poison Pill Detection: Identifies and flags paywalls, captchas, and rate limits.
Social Media Scraping: Includes patterns for YouTube (metadata, video/audio download, transcripts) and Instagram (post data, media download) using yt-dlp and instaloader.
Undocumented API Discovery: Provides methods for reverse-engineering and utilizing hidden APIs.
Use Case: Scrape product details from an e-commerce site that heavily relies on JavaScript, ensuring you get all product information despite anti-scraping measures.

Quick Start

Use the web-scraping skill to extract the main content and title from the URL 'https://example.com'.

web-scraping

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper