quantization

Community

Lean, fast model quantization for inference.

Authoratrawog
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Quantizes neural network models to reduce memory footprint and accelerate inference.

Core Features & Use Cases

  • Supports FP32, FP16, BF16, INT8, INT4 precisions to balance accuracy and performance.
  • Provides BitsAndBytes-based loading configurations (load_in_4bit, nf4, fp4) and memory estimation.
  • Enables deployment on memory-constrained hardware and training with QLoRA workflows.

Quick Start

Quantize your model to 4-bit NF4 for reduced memory and faster inference.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: quantization
Download link: https://github.com/atrawog/overthink-plugins/archive/main.zip#quantization

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.