model-deployment
CommunityDeploy fine-tuned models to production with ease.
AuthorScientiaCapital
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Export and deploy fine-tuned models to production.
Core Features & Use Cases
- GGUF export: export fine-tuned models for local or edge inference with llama.cpp/Ollama.
- Production deployment: set up and run high-throughput serving with vLLM, Ollama, or Docker-based deployments.
- Hub sharing & versioning: publish and version models on HuggingFace Hub for collaboration and reuse.
- Use Case: A medical domain team finishes fine-tuning a model and deploys it to a vLLM server behind a load balancer for 24/7 API access.
Quick Start
- Export the fine-tuned model to GGUF, e.g., model.save_pretrained_gguf('./gguf_output', tokenizer, quantization_method='q4_k_m').
- Deploy with Ollama or vLLM: for Ollama, ollama create my-model -f Modelfile; for vLLM, start the server with the appropriate model path.
- Optionally push the model to HuggingFace Hub for sharing and versioning.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: model-deployment Download link: https://github.com/ScientiaCapital/unsloth-mcp-server/archive/main.zip#model-deployment Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.