sft-data-format

Community

Clarify data formats and pipeline metadata.

AuthorHsunGong
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps teams validate and document data formats and metadata structures used in the SFT pipeline, reducing integration errors and improving reproducibility.

Core Features & Use Cases

  • Data format validation: Ensures input/output data follow the JSON Lines convention and the stage metadata structure.
  • Resume key handling: Verifies usage of idx as a resuming key across pipeline steps.
  • Metadata documentation: Produces clear metadata schemas and examples for downstream components.
  • Use Case: When ingesting datasets into the SFT pipeline, run this skill to confirm formatting, keys, and metadata are consistent before processing.

Quick Start

Use the sft-data-format skill to validate your dataset's JSON Lines formatting, idx resume keys, and think tags.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: sft-data-format
Download link: https://github.com/HsunGong/prep/archive/main.zip#sft-data-format

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.