Determinism

Same inputs → byte-for-byte identical outputs.

Enables:

CI diff-gating (block regressions)
Reproducibility (re-run = same artefacts)
Caching (skip computed results)
Debugging (isolate changes)

What’s Deterministic

Artifact	Deterministic?	How
`run_id`	Yes	SHA-256 of config + dataset
`started_at` / `completed_at`	Yes	Derived from run_id
`records.jsonl`	Yes	Stable JSON, sorted keys
`manifest.json`	Yes	Stable JSON, sorted keys
`diff.json`	Yes	Stable JSON, sorted keys
Model responses	No	API-dependent
Actual latency	No	Runtime-dependent

How It Works

Run ID Generation

graph LR
    Config[Config] --> Hash[SHA-256]
    Dataset[Dataset Hash] --> Hash
    Schema[Schema Version] --> Hash
    Hash --> RunID[run_id]

run_id = sha256(
    schema_version +
    model_spec +
    probe_spec +
    dataset_hash +
    probe_kwargs
)[:32]

Deterministic Timestamps

Timestamps are derived from the run_id, not wall-clock time:

base_time = hash_to_datetime(run_id)
item_started_at = base_time + (index * 2) microseconds
item_completed_at = base_time + (index * 2 + 1) microseconds

This ensures identical timestamps for identical runs.

Dataset Hashing

Local files are content-addressed:

dataset:
  format: jsonl
  path: data/test.jsonl
  # Computed: dataset_hash: sha256:abc123...

The hash is included in the run_id, so different data = different run.

Stable JSON Output

JSON is emitted with:

Sorted keys (sort_keys=True)
Consistent separators (, and :)
No trailing whitespace
UTF-8 encoding

json.dumps(data, sort_keys=True, separators=(",", ":"))

Determinism Controls (Strict Mode)

insideLLMs defaults to strict determinism controls for hashing and artefact emission. If you want a more permissive mode (or want host metadata persisted), you can disable them.

Config:

determinism:
  strict_serialization: false
  deterministic_artifacts: false

CLI:

insidellms run config.yaml --no-strict-serialization --no-deterministic-artifacts
insidellms harness harness.yaml --no-strict-serialization --no-deterministic-artifacts

Behavior:

strict_serialization: rejects non-deterministic hashing/fingerprinting inputs (for example, exotic objects or dict key collisions like 1 vs "1").
deterministic_artifacts: neutralizes host-dependent manifest fields by persisting python_version and platform as null.

If deterministic_artifacts is omitted, it defaults to the value of strict_serialization.

Volatile Fields

Some fields are intentionally excluded from determinism:

Field	Why Excluded
`latency_ms`	Runtime-dependent
Actual wall-clock time	Not reproducible
`library_version`	May change
`platform`	Environment-dependent (or `null` in deterministic artifacts mode)
`command`	CLI invocation varies

These are stored as null or excluded from diff comparisons.

CI Diff-Gating

The determinism guarantees enable this workflow:

graph TD
    Baseline[Baseline Run] --> Store[Store in Repo]
    PR[Pull Request] --> Candidate[Candidate Run]
    Candidate --> Diff[insidellms diff]
    Store --> Diff
    Diff --> Decision{Changes?}
    Decision -->|No| Pass[Pass]
    Decision -->|Yes| Fail[Fail]

# In CI
insidellms harness config.yaml --run-dir ./candidate
insidellms diff ./baseline ./candidate --fail-on-changes

Best Practices

Do

Store baselines in version control
Use content-addressed datasets
Fix random seeds when possible
Use --fail-on-changes in CI

Don’t

Rely on wall-clock timestamps
Include latency in comparisons
Change config between baseline and candidate
Modify datasets without updating baseline

When Determinism Breaks

Symptom	Cause	Fix
Different run_id	Config or dataset changed	Verify inputs match
Different timestamps	Using old artifacts	Re-generate with current version
Diff shows changes	Model behaviour changed	Update baseline or investigate
Flaky CI	Non-deterministic model	Use DummyModel for determinism tests

DummyModel for Testing

For purely deterministic tests, use DummyModel:

models:
  - type: dummy
    args:
      response: "Fixed response"

This produces byte-for-byte identical outputs every time.