Understanding Outputs

What insideLLMs creates and why.

Artefacts

File	Purpose
`records.jsonl`	Every input/output pair (canonical)
`manifest.json`	Run metadata
`summary.json`	Aggregated metrics
`report.html`	Visual comparison
`diff.json`	Change detection (via `insidellms diff`)

records.jsonl

One JSON line per result:

{"example_id": "0", "input": {"question": "What is 2 + 2?"}, "output": "4", "status": "success"}
{"example_id": "1", "input": {"question": "Is the sky blue?"}, "output": "Yes", "status": "success"}

Key fields:

run_id - Deterministic hash (same inputs = same ID)
example_id - Input identifier
input - Original data
output - Model response
status - success or error

summary.json

Aggregated stats:

{
  "models": {
    "gpt-4o": {"success_rate": 0.98, "example_count": 100}
  }
}

report.html

Standalone HTML comparison. Open in browser. No server needed.

diff.json

insidellms diff baseline/ candidate/

{
  "changes": [
    {"example_id": "42", "field": "output",
     "baseline": "The answer is 4", "candidate": "The answer is four"}
  ],
  "summary": {"total_examples": 100, "changed": 3}
}

For CI:

insidellms diff baseline/ candidate/ --fail-on-changes
# Exit 1 if changes detected

Why Deterministic?

Same inputs → byte-for-byte identical outputs.

Enables:

CI diff-gating (block regressions)
Reproducibility (re-run = same artefacts)
Caching (skip computed results)

CI Integration Tutorial → Block regressions in CI.