model-deployment

Community

Deploy fine-tuned models to production with ease.

AuthorScientiaCapital
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Export and deploy fine-tuned models to production.

Core Features & Use Cases

  • GGUF export: export fine-tuned models for local or edge inference with llama.cpp/Ollama.
  • Production deployment: set up and run high-throughput serving with vLLM, Ollama, or Docker-based deployments.
  • Hub sharing & versioning: publish and version models on HuggingFace Hub for collaboration and reuse.
  • Use Case: A medical domain team finishes fine-tuning a model and deploys it to a vLLM server behind a load balancer for 24/7 API access.

Quick Start

  1. Export the fine-tuned model to GGUF, e.g., model.save_pretrained_gguf('./gguf_output', tokenizer, quantization_method='q4_k_m').
  2. Deploy with Ollama or vLLM: for Ollama, ollama create my-model -f Modelfile; for vLLM, start the server with the appropriate model path.
  3. Optionally push the model to HuggingFace Hub for sharing and versioning.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: model-deployment
Download link: https://github.com/ScientiaCapital/unsloth-mcp-server/archive/main.zip#model-deployment

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.