Caching

Reduce costs. Speed up iteration.

Enable

cache:
  enabled: true
  backend: sqlite
  path: .cache/insidellms.db

Cache Backends

Backend Persistence Speed Use Case
memory Session only Fastest Testing
sqlite Disk Fast Development
redis Network Medium Team sharing

Memory Cache

cache:
  enabled: true
  backend: memory

Session-only; cleared when process exits.

SQLite Cache

cache:
  enabled: true
  backend: sqlite
  path: .cache/responses.db

Persistent; survives restarts.

Redis Cache

cache:
  enabled: true
  backend: redis
  url: redis://localhost:6379/0

Shared across machines; requires Redis server.

Cache Key Generation

Cache keys include:

  • Model identifier
  • Prompt/messages content
  • Generation parameters (temperature, max_tokens, etc.)
cache_key = hash(model_id + prompt + sorted(kwargs))

This means:

  • Same prompt + same params = cache hit
  • Different temperature = cache miss

Cache Invalidation

Clear All

rm -rf .cache/insidellms.db

Clear Programmatically

cache.clear()

TTL-based Expiration

cache:
  enabled: true
  backend: sqlite
  ttl_seconds: 86400  # 24 hours

When to Disable Caching

  • Evaluating model updates: Need fresh responses
  • Testing randomness: Want different responses each time
  • Production benchmarks: Measure actual latency
cache:
  enabled: false

Or via CLI:

insidellms run config.yaml --no-cache

Cache Statistics

stats = cache.stats()
print(f"Hits: {stats['hits']}")
print(f"Misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.1%}")

Best Practices

Do

  • Enable caching during development
  • Use SQLite for persistence
  • Set appropriate TTL for time-sensitive data
  • Clear cache when changing model behaviour

Don’t

  • Cache in production benchmarks
  • Share cache between different model versions
  • Forget to invalidate after model updates

Determinism Note

Caching can affect determinism:

  • With cache: Faster, but responses depend on cache state
  • Without cache: Slower, but fresh responses each time

For CI diff-gating with DummyModel, caching doesn’t matter (responses are fixed).

See Also