caching-architecture

Community

Multi-tier caching for faster LiteLLM-RS responses.

Authormajiayu000
Version1.0.0
Installs0

System Documentation

What problem does it solve?

LiteLLM-RS users often incur unnecessary latency and higher costs due to repeated identical requests in high-traffic gateway scenarios. This Skill provides a structured multi-tier caching architecture to transparently reuse results and cut both latency and cost.

Core Features & Use Cases

  • In-Memory L1 Cache: microsecond latency with LRU eviction to store recent responses.
  • Redis L2 Cache: exact-match caching with TTL-based expiration for scalable persistence.
  • Semantic L3 Cache: vector-store backed caching (Qdrant/Weaviate/Pinecone) for near-similar results.
  • Use Case: a chat gateway that processes thousands of identical user queries per minute can reuse prior responses instead of recomputing.

Quick Start

Configure the LiteLLM-RS cache in your config, initialize the CacheManager with the cache settings, and wrap the request/response cycle to get from cache first, then populate caches after a successful response.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: caching-architecture
Download link: https://github.com/majiayu000/litellm-rs/archive/main.zip#caching-architecture

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 223,000+ vetted skills library on demand.