Name: gptq
Availability: InStock
Author: DoanNgocCuong

System Documentation

What problem does it solve?

This Skill enables the deployment of large language models (LLMs) on hardware with limited memory, such as consumer GPUs, by significantly reducing their size.

Core Features & Use Cases

4-Bit Quantization: Compresses LLMs to 4-bit precision using the GPTQ algorithm, drastically cutting memory requirements.
Minimal Accuracy Loss: Achieves this compression with less than 2% degradation in model performance (perplexity).
Faster Inference: Provides a 3-4x speedup in inference compared to standard FP16 models.
Use Case: Deploying a 70B parameter LLM on a single RTX 4090 GPU (24GB VRAM) for real-time chat applications, which would be impossible with FP16 precision.

Quick Start

Use the gptq skill to load the quantized model 'TheBloke/Llama-2-7B-Chat-GPTQ' onto your CUDA device.

Please help me install this Skill: Name: gptq Download link: https://github.com/DoanNgocCuong/continuous-training-pipeline_T3_2026/archive/main.zip#gptq Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

gptq

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper