llama-cpp
CommunityCPU-first LLM inference on non-NVIDIA hardware.
Authorovachiever
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Pure C/C++ LLM inference on CPUs, Apple Silicon, and non-NVIDIA GPUs with GGUF quantization for memory efficiency and speed improvements.
Core Features & Use Cases
- CPU-based inference without CUDA (Apple Silicon, AMD/Intel GPUs)
- GGUF quantization (1.5-8 bit) for efficient memory usage
- Edge and lightweight deployment scenarios
Quick Start
Install llama.cpp, download a GGUF model, and run the CLI for offline or server-based inference.
Dependency Matrix
Required Modules
llama-cpp-python
Components
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llama-cpp Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#llama-cpp Please download this .zip file, extract it, and install it in the .claude/skills/ directory.