hqq-quantization
CommunityQuantize LLMs fast, no calibration needed.
Software Engineering#quantization#llm optimization#model compression#memory efficiency#inference speed#hqq
AuthorDoanNgocCuong
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill enables rapid and efficient quantization of Large Language Models (LLMs) to lower bit precisions (e.g., 4-bit, 3-bit, 2-bit) without requiring calibration datasets, significantly reducing memory footprint and speeding up inference.
Core Features & Use Cases
- Calibration-Free Quantization: Quantize models instantly without needing a representative dataset.
- Flexible Precision: Supports 8-bit, 4-bit, 3-bit, and 2-bit quantization with configurable group sizes.
- Optimized Backends: Integrates with various high-performance backends like Marlin, TorchAO, and BitBlas for accelerated inference.
- Framework Integration: Seamlessly works with HuggingFace Transformers and vLLM for easy deployment.
- PEFT Compatible: Enables fine-tuning of quantized models using LoRA and other Parameter-Efficient Fine-Tuning techniques.
- Use Case: You have a large LLM that consumes too much VRAM. Use this Skill to quantize it to 4-bit precision, allowing it to run on hardware with less memory while maintaining good performance.
Quick Start
Use the hqq-quantization skill to quantize the 'meta-llama/Llama-3.1-8B' model to 4-bit precision.
Dependency Matrix
Required Modules
hqqtorch
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: hqq-quantization Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#hqq-quantization Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.