uv-sglang

Community

Fast LLM serving with prefix caching.

Authoruv-xiao
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill addresses the performance bottlenecks in serving Large Language Models (LLMs), particularly for repetitive tasks like agentic workflows and structured output generation, by significantly accelerating inference speed and improving throughput.

Core Features & Use Cases

  • High-Performance Serving: Offers 5x faster inference than vLLM through RadixAttention prefix caching.
  • Structured Generation: Excels at generating JSON, regex-constrained, or grammar-based outputs, crucial for agent tool calls and data parsing.
  • Agentic Workflows: Optimizes multi-turn conversations and agent interactions by reusing KV caches for system prompts and conversation history.
  • Use Case: Building an AI agent that repeatedly calls tools. SGLang's RadixAttention caches the tool definitions and system prompt, drastically reducing latency for subsequent tool calls compared to standard serving frameworks.

Quick Start

Launch the SGLang server with a Llama 3-8B model on port 30000.

Dependency Matrix

Required Modules

sglangtorchtransformers

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: uv-sglang
Download link: https://github.com/uv-xiao/pkbllm/archive/main.zip#uv-sglang

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.