croissant

Community

Generate verified ML dataset metadata.

Authorcbizon
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill automates the creation of machine-readable dataset descriptors (Croissant files) for ML datasets, ensuring all metadata is traceable to crawled sources and not the model's training knowledge.

Core Features & Use Cases

  • Automated Croissant Generation: Creates _croissant.json, _provenance.json, and _incomplete.json files for datasets.
  • Data Source Verification: Guarantees all metadata claims are backed by crawled URLs, enforcing strict "no training knowledge" and "authoritative source" rules.
  • Use Case: When you have a publicly accessible dataset URL, use this Skill to generate a complete, verified Croissant metadata package that can be used by ML frameworks and catalogues.

Quick Start

Use the croissant skill to generate a Croissant file for the dataset at https://www.bindingdb.org.

Dependency Matrix

Required Modules

mlcroissant

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: croissant
Download link: https://github.com/cbizon/claussant/archive/main.zip#croissant

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.