llama-cpp

Community

CPU-first LLM inference on non-NVIDIA hardware.

Authorovachiever
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Pure C/C++ LLM inference on CPUs, Apple Silicon, and non-NVIDIA GPUs with GGUF quantization for memory efficiency and speed improvements.

Core Features & Use Cases

  • CPU-based inference without CUDA (Apple Silicon, AMD/Intel GPUs)
  • GGUF quantization (1.5-8 bit) for efficient memory usage
  • Edge and lightweight deployment scenarios

Quick Start

Install llama.cpp, download a GGUF model, and run the CLI for offline or server-based inference.

Dependency Matrix

Required Modules

llama-cpp-python

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: llama-cpp
Download link: https://github.com/ovachiever/droid-tings/archive/main.zip#llama-cpp

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository