scrape-webpage

Official

Extract content, images, and metadata from any URL.

Authoraemsites
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This skill automates the initial data extraction phase of content migration, reliably scraping webpage content, downloading images, and extracting critical metadata. It prepares everything for import into AEM Edge Delivery Services, saving significant manual effort.

Core Features & Use Cases

  • Comprehensive Content Extraction: Loads pages in a headless browser, scrolls to trigger lazy loading, and extracts cleaned HTML by removing non-content elements.
  • Intelligent Image Handling: Downloads all images (converting formats like WebP/SVG to PNG), fixes DOM references, and replaces URLs with local paths for seamless migration.
  • Rich Metadata Extraction: Captures SEO-critical data including title, description, Open Graph, JSON-LD, and canonical links, preserving valuable page information.
  • Use Case: Provide a legacy webpage URL, and this skill will return a metadata.json with all extracted data, a screenshot.png, cleaned.html with local image paths, and an images/ folder, ready for the next steps in your migration workflow.

Quick Start

Use the scrape-webpage skill to extract content, images, and metadata from "https://www.example.com/about-us" and save it to the ./import-work directory.

Dependency Matrix

Required Modules

playwrightsharp

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: scrape-webpage
Download link: https://github.com/aemsites/koassets/archive/main.zip#scrape-webpage

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository