croissant
CommunityGenerate verified ML dataset metadata.
Data & Analytics#data verification#metadata generation#data provenance#croissant#dataset descriptor#mlcommons
Authorcbizon
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill automates the creation of machine-readable dataset descriptors (Croissant files) for ML datasets, ensuring all metadata is traceable to crawled sources and not the model's training knowledge.
Core Features & Use Cases
- Automated Croissant Generation: Creates
_croissant.json,_provenance.json, and_incomplete.jsonfiles for datasets. - Data Source Verification: Guarantees all metadata claims are backed by crawled URLs, enforcing strict "no training knowledge" and "authoritative source" rules.
- Use Case: When you have a publicly accessible dataset URL, use this Skill to generate a complete, verified Croissant metadata package that can be used by ML frameworks and catalogues.
Quick Start
Use the croissant skill to generate a Croissant file for the dataset at https://www.bindingdb.org.
Dependency Matrix
Required Modules
mlcroissant
Components
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: croissant Download link: https://github.com/cbizon/claussant/archive/main.zip#croissant Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.