Name: caching-architecture
Availability: InStock
Author: majiayu000

System Documentation

What problem does it solve?

LiteLLM-RS users often incur unnecessary latency and higher costs due to repeated identical requests in high-traffic gateway scenarios. This Skill provides a structured multi-tier caching architecture to transparently reuse results and cut both latency and cost.

Core Features & Use Cases

In-Memory L1 Cache: microsecond latency with LRU eviction to store recent responses.
Redis L2 Cache: exact-match caching with TTL-based expiration for scalable persistence.
Semantic L3 Cache: vector-store backed caching (Qdrant/Weaviate/Pinecone) for near-similar results.
Use Case: a chat gateway that processes thousands of identical user queries per minute can reuse prior responses instead of recomputing.

Quick Start

Configure the LiteLLM-RS cache in your config, initialize the CacheManager with the cache settings, and wrap the request/response cycle to get from cache first, then populate caches after a successful response.

Please help me install this Skill: Name: caching-architecture Download link: https://github.com/majiayu000/litellm-rs/archive/main.zip#caching-architecture Please download this .zip file, extract it, and install it in the .claude/skills/ directory.

caching-architecture

System Documentation

What problem does it solve?

Core Features & Use Cases

Quick Start

Dependency Matrix

Required Modules

Components

💻 Claude Code Installation

Agent Skills Search Helper