hqq-quantization

Name: hqq-quantization
Availability: InStock
Author: DoanNgocCuong

Community

Quantize LLMs fast, no calibration needed.

Software Engineering #quantization #llm optimization #model compression #memory efficiency #inference speed #hqq

AuthorDoanNgocCuong

Version1.0.0

Installs0

System Documentation

What problem does it solve?

This Skill enables rapid and efficient quantization of Large Language Models (LLMs) to lower bit precisions (e.g., 4-bit, 3-bit, 2-bit) without requiring calibration datasets, significantly reducing memory footprint and speeding up inference.

Core Features & Use Cases

Calibration-Free Quantization: Quantize models instantly without needing a representative dataset.
Flexible Precision: Supports 8-bit, 4-bit, 3-bit, and 2-bit quantization with configurable group sizes.
Optimized Backends: Integrates with various high-performance backends like Marlin, TorchAO, and BitBlas for accelerated inference.
Framework Integration: Seamlessly works with HuggingFace Transformers and vLLM for easy deployment.
PEFT Compatible: Enables fine-tuning of quantized models using LoRA and other Parameter-Efficient Fine-Tuning techniques.
Use Case: You have a large LLM that consumes too much VRAM. Use this Skill to quantize it to 4-bit precision, allowing it to run on hardware with less memory while maintaining good performance.

Quick Start

Use the hqq-quantization skill to quantize the 'meta-llama/Llama-3.1-8B' model to 4-bit precision.

hqq-quantization

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper