Guides

How-to guides for specific tasks and integrations.

Available Guides

Guide Description
Caching Speed up runs with response caching
Rate Limiting Handle API rate limits gracefully
Experiment Tracking Log runs to W&B, MLflow, TensorBoard
Local Models Run Ollama, llama.cpp, vLLM
Production Shadow Capture Capture sampled production traffic into canonical records
IDE Extension Workflow Run probes from VS Code/Cursor CodeLens
Troubleshooting Common issues and solutions

Guides vs Tutorials

Tutorials Guides
Learning-oriented Task-oriented
Step-by-step walkthroughs Focused how-tos
Build something complete Solve a specific problem
“How do I get started?” “How do I do X?”

Quick Solutions

Speed Up Runs

insidellms run config.yaml --async --concurrency 10

Reduce Costs

# Limit examples and enable cache middleware
max_examples: 50
pipeline:
  middlewares:
    - type: cache
      args:
        cache_size: 1000

Run Offline

# Use DummyModel for testing
models:
  - type: dummy

Debug Failures

# Check environment
insidellms doctor

# Validate artifacts
insidellms validate ./my_run

# Verbose output
insidellms run config.yaml --verbose

Common Patterns

Development Workflow

# 1. Generate and tune a local config
insidellms init dev.yaml --model dummy

# 2. Run a deterministic harness candidate
insidellms harness dev.yaml --run-dir .tmp/runs/dev --overwrite

# 3. Full run
insidellms run config.yaml --run-dir .tmp/runs/full --overwrite

CI Workflow

# 1. Run candidate
insidellms harness config.yaml --run-dir ./candidate

# 2. Diff against baseline
insidellms diff ./baseline ./candidate --fail-on-changes

Table of contents