vllm-omni-perf
CommunitySpeed up vLLM-Omni with benchmarks and tuning.
Authorhsliuustc0106
Version1.0.0
Installs0
System Documentation
What problem does it solve?
vLLM-Omni performance tuning helps engineers identify and reduce bottlenecks across autoregressive and diffusion pipelines, enabling faster inference, lower latency, and better resource utilization.
Core Features & Use Cases
- Benchmarking suite for end-to-end latency and throughput across models and hardware.
- Optimization levers including TeaCache, Cache-DiT, quantization, CPU offloading, and parallelism tuning.
- Use Case: A data science team benchmarks a DiT-based diffusion model before and after applying optimizations to quantify speedups.
Quick Start
Run a baseline benchmark on a sample model to establish a performance floor.
Dependency Matrix
Required Modules
None requiredComponents
scriptsreferences
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: vllm-omni-perf Download link: https://github.com/hsliuustc0106/vllm-omni-skills/archive/main.zip#vllm-omni-perf Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.