Name: inference-deploy
Availability: InStock
Author: Rachasumanth

System Documentation

What problem does it solve?

This Skill streamlines the process of deploying trained machine learning models for inference, making them accessible for real-time predictions and applications.

Core Features & Use Cases

Multiple Serving Frameworks: Supports vLLM, TGI, Ollama, and llama.cpp for flexible deployment.
Quantization: Enables model optimization through various quantization techniques (GGUF, GPTQ, AWQ).
API Endpoint Setup: Configures OpenAI-compatible API endpoints for easy integration.
Containerization: Generates Docker configurations for reproducible deployments.
Performance Validation: Includes load testing to benchmark inference speed and throughput.
Use Case: Deploy a fine-tuned LLM for a customer support chatbot using vLLM for high throughput and an OpenAI-compatible API.

Quick Start

Use the inference-deploy skill to deploy the model located at '/models/my-llm' using vLLM.

Please help me install this Skill: Name: inference-deploy Download link: https://github.com/Rachasumanth/text2llm001/archive/main.zip#inference-deploy Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

inference-deploy

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper