universal-inference-runtime
CommunityUnified AI model inference across backends.
AuthorAmitabhainArunachala
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill simplifies AI model deployment and inference by providing a single, unified API that works across multiple backends like Ollama and llama.cpp, eliminating vendor lock-in and hardware constraints.
Core Features & Use Cases
- Multi-Backend Support: Seamlessly switch between Ollama, llama.cpp, and potentially vLLM or cloud providers.
- Hot Model Swapping: Change AI models on the fly without restarting the application.
- Hardware Agnostic: Automatically detects and utilizes CUDA, ROCm, Metal/MPS, or CPU.
- Use Case: A developer needs to test a prompt against Gemma and then Llama 3.1 models. They can load Gemma using Ollama, get a response, and then instantly swap to Llama 3.1 without reconfiguring their environment.
Quick Start
Initialize the runtime and load the 'gemma3:4b' model to generate a response to 'Hello!'.
Dependency Matrix
Required Modules
requests>=2.28.0llama-cpp-python>=0.2.0numpy>=1.24.0
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: universal-inference-runtime Download link: https://github.com/AmitabhainArunachala/clawd/archive/main.zip#universal-inference-runtime Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.