inference-deploy

Community

Deploy models for inference.

AuthorRachasumanth
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill streamlines the process of deploying trained machine learning models for inference, making them accessible for real-time predictions and applications.

Core Features & Use Cases

  • Multiple Serving Frameworks: Supports vLLM, TGI, Ollama, and llama.cpp for flexible deployment.
  • Quantization: Enables model optimization through various quantization techniques (GGUF, GPTQ, AWQ).
  • API Endpoint Setup: Configures OpenAI-compatible API endpoints for easy integration.
  • Containerization: Generates Docker configurations for reproducible deployments.
  • Performance Validation: Includes load testing to benchmark inference speed and throughput.
  • Use Case: Deploy a fine-tuned LLM for a customer support chatbot using vLLM for high throughput and an OpenAI-compatible API.

Quick Start

Use the inference-deploy skill to deploy the model located at '/models/my-llm' using vLLM.

Dependency Matrix

Required Modules

vllmtext-generation-inferencehuggingface_hubauto-gptqautoawq

Components

scriptsreferences

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: inference-deploy
Download link: https://github.com/Rachasumanth/text2llm001/archive/main.zip#inference-deploy

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.