Concepts

Deep dives into how insideLLMs works.

Core Concepts

Concept Description
Models Unified interface for LLM providers
Probes Focused behavioural tests
Runners Orchestration and execution
Datasets Input formats and loading
Determinism Reproducibility guarantees
Artifacts Output files and schemas

How It All Fits Together

graph TB
    subgraph Inputs
        Config[Config YAML]
        Dataset[Dataset]
        APIKeys[API Keys]
    end

    subgraph Framework
        Registry[Registry]
        Runner[Runner]
        Model[Model]
        Probe[Probe]
    end

    subgraph Outputs
        Records[records.jsonl]
        Summary[summary.json]
        Report[report.html]
    end

    Config --> Registry
    APIKeys --> Model
    Registry --> Runner
    Dataset --> Runner
    Model --> Runner
    Probe --> Runner
    Runner --> Records
    Records --> Summary
    Records --> Report

Quick Concept Overview

Models

All LLM providers (OpenAI, Anthropic, local) share a unified interface:

model.generate("prompt")  # Text completion
model.chat([messages])    # Multi-turn conversation
model.info()              # Provider metadata

Probes

Probes test specific behaviours:

probe.run(model, input_data)  # Execute one test
probe.run_batch(model, dataset)  # Execute many
probe.score(results)  # Evaluate outcomes

Runners

Runners orchestrate execution:

runner = ProbeRunner(model, probe)
results = runner.run(dataset)
# or async:
results = await AsyncProbeRunner(model, probe).run(dataset, concurrency=10)

Determinism

Same inputs → identical outputs:

  • Run IDs are content hashes
  • Timestamps derive from run ID
  • Artifacts have stable formatting

This enables CI diff-gating.

When to Read These

  • New to insideLLMs? Start with Getting Started first
  • Building something? Check Tutorials for hands-on guides
  • Need details? Use Reference for complete API docs
  • Want to understand? You’re in the right place!

Table of contents