Name: count-dataset-tokens
Availability: InStock
Author: Zurybr

System Documentation

What problem does it solve?

This Skill provides clear guidance on how to accurately count tokens within datasets, especially when dealing with specific tokenizers and filtering requirements.

Core Features & Use Cases

Token Counting: Accurately count tokens in HuggingFace or similar datasets.
Data Filtering: Filter datasets by domain, category, or other specific fields.
Tokenizer Application: Use specified tokenizers (e.g., Qwen, DeepSeek, GPT) for precise counting.
Use Case: You need to determine the total token count for all 'technology' related articles in a large text dataset using the 'gpt2' tokenizer.

Quick Start

Use the count-dataset-tokens skill to count tokens in the 'wikipedia' dataset, filtering for the 'science' domain using the 'bert-base-uncased' tokenizer.

Please help me install this Skill: Name: count-dataset-tokens Download link: https://github.com/Zurybr/lefarma-skills/archive/main.zip#count-dataset-tokens Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

count-dataset-tokens

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper