Concepts
Deep dives into how insideLLMs works.
Core Concepts
| Concept | Description |
|---|---|
| Models | Unified interface for LLM providers |
| Probes | Focused behavioural tests |
| Runners | Orchestration and execution |
| Datasets | Input formats and loading |
| Determinism | Reproducibility guarantees |
| Artifacts | Output files and schemas |
How It All Fits Together
graph TB
subgraph Inputs
Config[Config YAML]
Dataset[Dataset]
APIKeys[API Keys]
end
subgraph Framework
Registry[Registry]
Runner[Runner]
Model[Model]
Probe[Probe]
end
subgraph Outputs
Records[records.jsonl]
Summary[summary.json]
Report[report.html]
end
Config --> Registry
APIKeys --> Model
Registry --> Runner
Dataset --> Runner
Model --> Runner
Probe --> Runner
Runner --> Records
Records --> Summary
Records --> Report
Quick Concept Overview
Models
All LLM providers (OpenAI, Anthropic, local) share a unified interface:
model.generate("prompt") # Text completion
model.chat([messages]) # Multi-turn conversation
model.info() # Provider metadata
Probes
Probes test specific behaviours:
probe.run(model, input_data) # Execute one test
probe.run_batch(model, dataset) # Execute many
probe.score(results) # Evaluate outcomes
Runners
Runners orchestrate execution:
runner = ProbeRunner(model, probe)
results = runner.run(dataset)
# or async:
results = await AsyncProbeRunner(model, probe).run(dataset, concurrency=10)
Determinism
Same inputs → identical outputs:
- Run IDs are content hashes
- Timestamps derive from run ID
- Artifacts have stable formatting
This enables CI diff-gating.
When to Read These
- New to insideLLMs? Start with Getting Started first
- Building something? Check Tutorials for hands-on guides
- Need details? Use Reference for complete API docs
- Want to understand? You’re in the right place!