Caching
Reduce costs. Speed up iteration.
Enable
cache:
enabled: true
backend: sqlite
path: .cache/insidellms.db
Cache Backends
| Backend | Persistence | Speed | Use Case |
|---|---|---|---|
memory | Session only | Fastest | Testing |
sqlite | Disk | Fast | Development |
redis | Network | Medium | Team sharing |
Memory Cache
cache:
enabled: true
backend: memory
Session-only; cleared when process exits.
SQLite Cache
cache:
enabled: true
backend: sqlite
path: .cache/responses.db
Persistent; survives restarts.
Redis Cache
cache:
enabled: true
backend: redis
url: redis://localhost:6379/0
Shared across machines; requires Redis server.
Cache Key Generation
Cache keys include:
- Model identifier
- Prompt/messages content
- Generation parameters (temperature, max_tokens, etc.)
cache_key = hash(model_id + prompt + sorted(kwargs))
This means:
- Same prompt + same params = cache hit
- Different temperature = cache miss
Cache Invalidation
Clear All
rm -rf .cache/insidellms.db
Clear Programmatically
cache.clear()
TTL-based Expiration
cache:
enabled: true
backend: sqlite
ttl_seconds: 86400 # 24 hours
When to Disable Caching
- Evaluating model updates: Need fresh responses
- Testing randomness: Want different responses each time
- Production benchmarks: Measure actual latency
cache:
enabled: false
Or via CLI:
insidellms run config.yaml --no-cache
Cache Statistics
stats = cache.stats()
print(f"Hits: {stats['hits']}")
print(f"Misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.1%}")
Best Practices
Do
- Enable caching during development
- Use SQLite for persistence
- Set appropriate TTL for time-sensitive data
- Clear cache when changing model behaviour
Don’t
- Cache in production benchmarks
- Share cache between different model versions
- Forget to invalidate after model updates
Determinism Note
Caching can affect determinism:
- With cache: Faster, but responses depend on cache state
- Without cache: Slower, but fresh responses each time
For CI diff-gating with DummyModel, caching doesn’t matter (responses are fixed).
See Also
- Performance and Caching - More performance options
- Rate Limiting - Combine with caching