Artifacts
insideLLMs produces structured artifacts for analysis and CI integration.
Overview
graph TD
Run[Runner Execution] --> Records[records.jsonl]
Run --> Manifest[manifest.json]
Run --> Config[config.resolved.yaml]
Records --> Summary[summary.json]
Records --> Report[report.html]
Records --> Diff[diff.json]
Baseline[Baseline records] --> Diff
Artifact Types
| Artifact | Purpose | When Created |
|---|---|---|
records.jsonl | Raw results | During run |
manifest.json | Run metadata | After completion |
config.resolved.yaml | Full config | Start of run |
summary.json | Aggregated stats | After completion |
report.html | Human report | On request |
diff.json | Run comparison | Via insidellms diff |
records.jsonl
The canonical output. One JSON line per result:
{
"schema_version": "1.0.0",
"run_id": "a1b2c3d4...",
"started_at": "2009-03-14T15:09:26.535897+00:00",
"completed_at": "2009-03-14T15:09:26.535898+00:00",
"model": {
"model_id": "gpt-4o",
"provider": "openai"
},
"probe": {
"probe_id": "logic"
},
"dataset": {
"dataset_id": "test.jsonl",
"dataset_hash": "sha256:abc123..."
},
"example_id": "0",
"input": {"question": "What is 2 + 2?"},
"output": "4",
"status": "success",
"error": null,
"error_type": null
}
Key Fields
| Field | Description |
|---|---|
schema_version | Artifact schema version |
run_id | Deterministic run identifier |
started_at | Deterministic timestamp |
model | Model specification |
probe | Probe specification |
example_id | Input identifier |
input | Original input data |
output | Model/probe output |
status | "success" or "error" |
manifest.json
Run-level metadata:
{
"schema_version": "1.0.0",
"run_id": "a1b2c3d4...",
"created_at": "2009-03-14T15:09:26.535897+00:00",
"started_at": "2009-03-14T15:09:26.535897+00:00",
"completed_at": "2009-03-14T15:09:26.535899+00:00",
"run_completed": true,
"library_version": "0.1.0",
"python_version": "3.11.0",
"platform": "macOS-14.0-arm64",
"model": {...},
"probe": {...},
"dataset": {...},
"record_count": 100,
"success_count": 98,
"error_count": 2,
"records_file": "records.jsonl"
}
config.resolved.yaml
The fully resolved configuration:
model:
type: openai
args:
model_name: gpt-4o
temperature: 0.7
probe:
type: logic
args: {}
dataset:
format: jsonl
path: /absolute/path/to/data.jsonl
dataset_hash: sha256:abc123...
Useful for:
- Reproducing runs exactly
- Debugging path resolution
- Auditing configurations
summary.json
Aggregated statistics:
{
"schema_version": "1.0.0",
"run_id": "a1b2c3d4...",
"models": {
"gpt-4o": {
"success_rate": 0.98,
"example_count": 100,
"error_count": 2
}
},
"probes": {
"logic": {
"success_rate": 0.98
}
},
"overall": {
"success_rate": 0.98,
"total_examples": 100
}
}
report.html
Standalone HTML report with:
- Model comparison tables
- Success/failure breakdown
- Individual response viewer
- Filtering and search
Open directly in any browser.
diff.json
Comparison between two runs:
{
"baseline_run_id": "abc123...",
"candidate_run_id": "def456...",
"baseline_path": "/path/to/baseline",
"candidate_path": "/path/to/candidate",
"changes": [
{
"example_id": "42",
"field": "output",
"baseline": "The answer is 4",
"candidate": "The answer is four"
}
],
"summary": {
"total_examples": 100,
"changed": 3,
"unchanged": 97,
"added": 0,
"removed": 0
}
}
Working with Artifacts
Reading Records
import json
records = []
with open("run_dir/records.jsonl") as f:
for line in f:
records.append(json.loads(line))
Generating Summary
insidellms report ./run_dir --summary-only
Generating HTML Report
insidellms report ./run_dir
# Creates ./run_dir/report.html
Comparing Runs
insidellms diff ./baseline ./candidate
Schema Versions
Artifacts are versioned:
| Version | Changes |
|---|---|
1.0.0 | Initial schema |
1.0.1 | Added run_completed flag |
Check version:
record = json.loads(line)
version = record["schema_version"]
Determinism
All artifacts are deterministic:
- Sorted JSON keys
- Consistent separators
- Timestamps derived from run_id
- Stable record ordering
See Determinism for details.
See Also
- Understanding Outputs - Beginner guide
- Determinism - Why artifacts are stable
- CLI Reference - Commands for working with artifacts