Concepts

Deep dives into how insideLLMs works.

Core Concepts

Concept	Description
Models	Unified interface for LLM providers
Probes	Focused behavioural tests
Runners	Orchestration and execution
Datasets	Input formats and loading
Determinism	Reproducibility guarantees
Artifacts	Output files and schemas

How It All Fits Together

graph TB
    subgraph Inputs
        Config[Config YAML]
        Dataset[Dataset]
        APIKeys[API Keys]
    end

    subgraph Framework
        Registry[Registry]
        Runner[Runner]
        Model[Model]
        Probe[Probe]
    end

    subgraph Outputs
        Records[records.jsonl]
        Summary[summary.json]
        Report[report.html]
    end

    Config --> Registry
    APIKeys --> Model
    Registry --> Runner
    Dataset --> Runner
    Model --> Runner
    Probe --> Runner
    Runner --> Records
    Records --> Summary
    Records --> Report

Quick Concept Overview

Models

All LLM providers (OpenAI, Anthropic, local) share a unified interface:

model.generate("prompt")  # Text completion
model.chat([messages])    # Multi-turn conversation
model.info()              # Provider metadata

Probes

Probes test specific behaviours:

probe.run(model, input_data)  # Execute one test
probe.run_batch(model, dataset)  # Execute many
probe.score(results)  # Evaluate outcomes

Runners

Runners orchestrate execution:

runner = ProbeRunner(model, probe)
results = runner.run(dataset)
# or async:
results = await AsyncProbeRunner(model, probe).run(dataset, concurrency=10)

Determinism

Same inputs → identical outputs:

Run IDs are content hashes
Timestamps derive from run ID
Artifacts have stable formatting

This enables CI diff-gating.

When to Read These

New to insideLLMs? Start with Getting Started first
Building something? Check Tutorials for hands-on guides
Need details? Use Reference for complete API docs
Want to understand? You’re in the right place!