llama-cpp
CommunityRun LLMs efficiently on any hardware.
AuthorAXGZ21
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill enables running large language models (LLMs) on standard hardware, including CPUs, Apple Silicon, and non-NVIDIA GPUs, overcoming the limitations of CUDA-dependent solutions.
Core Features & Use Cases
- CPU & Edge Inference: Optimized for running LLMs on consumer-grade CPUs and edge devices.
- Apple Silicon Support: Leverages Metal for efficient inference on M1/M2/M3 Macs.
- Non-NVIDIA GPU Support: Supports AMD and Intel GPUs via ROCm and other backends.
- GGUF Quantization: Utilizes quantized model formats for reduced memory footprint and faster inference.
- Use Case: Deploying a chatbot on a laptop or a Raspberry Pi, or running LLMs on a workstation without an NVIDIA GPU.
Quick Start
Use the llama-cpp skill to run inference with the model located at 'models/llama-2-7b-chat.Q4_K_M.gguf' and explain quantum computing.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: llama-cpp Download link: https://github.com/AXGZ21/hermes-agent-railway/archive/main.zip#llama-cpp Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.