scrape-webpage
OfficialExtract content, images, and metadata from any URL.
Software Engineering#automation#web scraping#content extraction#playwright#metadata extraction#aem migration#image download
Authoraemsites
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This skill automates the initial data extraction phase of content migration, reliably scraping webpage content, downloading images, and extracting critical metadata. It prepares everything for import into AEM Edge Delivery Services, saving significant manual effort.
Core Features & Use Cases
- Comprehensive Content Extraction: Loads pages in a headless browser, scrolls to trigger lazy loading, and extracts cleaned HTML by removing non-content elements.
- Intelligent Image Handling: Downloads all images (converting formats like WebP/SVG to PNG), fixes DOM references, and replaces URLs with local paths for seamless migration.
- Rich Metadata Extraction: Captures SEO-critical data including title, description, Open Graph, JSON-LD, and canonical links, preserving valuable page information.
- Use Case: Provide a legacy webpage URL, and this skill will return a
metadata.jsonwith all extracted data, ascreenshot.png,cleaned.htmlwith local image paths, and animages/folder, ready for the next steps in your migration workflow.
Quick Start
Use the scrape-webpage skill to extract content, images, and metadata from "https://www.example.com/about-us" and save it to the ./import-work directory.
Dependency Matrix
Required Modules
playwrightsharp
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: scrape-webpage Download link: https://github.com/aemsites/koassets/archive/main.zip#scrape-webpage Please download this .zip file, extract it, and install it in the .claude/skills/ directory.