Name: llm-inference-batching-scheduler
Availability: InStock
Author: Zurybr

System Documentation

What problem does it solve?

This Skill addresses the complex challenge of optimizing batch schedulers for LLM inference on compilation-based accelerators, aiming to minimize costs while adhering to strict latency requirements.

Core Features & Use Cases

Cost Optimization: Reduces compilation costs by minimizing unique shapes and padding overhead.
Latency Management: Balances batching strategies to meet P95 and P99 latency thresholds.
Use Case: When deploying LLMs on TPUs, this skill helps design a scheduler that efficiently groups incoming requests to reduce expensive shape compilations and minimize wasted computation due to padding, ensuring fast response times.

Quick Start

Analyze the request distribution and cost model to derive optimal generation bucket sizes and shape configurations for LLM inference batching.

Please help me install this Skill: Name: llm-inference-batching-scheduler Download link: https://github.com/Zurybr/lefarma-skills/archive/main.zip#llm-inference-batching-scheduler Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

llm-inference-batching-scheduler

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper