universal-inference-runtime

Community

Unified AI model inference across backends.

AuthorAmitabhainArunachala
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill simplifies AI model deployment and inference by providing a single, unified API that works across multiple backends like Ollama and llama.cpp, eliminating vendor lock-in and hardware constraints.

Core Features & Use Cases

  • Multi-Backend Support: Seamlessly switch between Ollama, llama.cpp, and potentially vLLM or cloud providers.
  • Hot Model Swapping: Change AI models on the fly without restarting the application.
  • Hardware Agnostic: Automatically detects and utilizes CUDA, ROCm, Metal/MPS, or CPU.
  • Use Case: A developer needs to test a prompt against Gemma and then Llama 3.1 models. They can load Gemma using Ollama, get a response, and then instantly swap to Llama 3.1 without reconfiguring their environment.

Quick Start

Initialize the runtime and load the 'gemma3:4b' model to generate a response to 'Hello!'.

Dependency Matrix

Required Modules

requests>=2.28.0llama-cpp-python>=0.2.0numpy>=1.24.0

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: universal-inference-runtime
Download link: https://github.com/AmitabhainArunachala/clawd/archive/main.zip#universal-inference-runtime

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.