caching-architecture
CommunityMulti-tier caching for faster LiteLLM-RS responses.
Authormajiayu000
Version1.0.0
Installs0
System Documentation
What problem does it solve?
LiteLLM-RS users often incur unnecessary latency and higher costs due to repeated identical requests in high-traffic gateway scenarios. This Skill provides a structured multi-tier caching architecture to transparently reuse results and cut both latency and cost.
Core Features & Use Cases
- In-Memory L1 Cache: microsecond latency with LRU eviction to store recent responses.
- Redis L2 Cache: exact-match caching with TTL-based expiration for scalable persistence.
- Semantic L3 Cache: vector-store backed caching (Qdrant/Weaviate/Pinecone) for near-similar results.
- Use Case: a chat gateway that processes thousands of identical user queries per minute can reuse prior responses instead of recomputing.
Quick Start
Configure the LiteLLM-RS cache in your config, initialize the CacheManager with the cache settings, and wrap the request/response cycle to get from cache first, then populate caches after a successful response.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: caching-architecture Download link: https://github.com/majiayu000/litellm-rs/archive/main.zip#caching-architecture Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.