This page explains how trace bundles and output fingerprinting appear in records.jsonl and how insidellms diff uses them for CI gating.
Trace data is opt-in: most runs will not include custom.trace unless you emit it explicitly (e.g., via TraceRecorder + trace_to_custom_field).
Trace bundles (ResultRecord.custom["trace"])
A trace bundle is a structured summary (and optionally a full event list) of a record’s execution trace. When present, it is stored under ResultRecord.custom["trace"] and validated against the insideLLMs.custom.trace@1 schema.
Key points:
- Schema name:
insideLLMs.custom.trace@1(stable key ordering; append-only fields) - Location:
ResultRecord.custom["trace"] - Emitter:
trace_to_custom_field(plus whatever instrumentation populates the events)
If you do not emit a trace bundle, trace-aware diff flags are effectively no-ops.
Where fingerprints and violations live
insidellms diff looks in two places:
- Structured bundle:
custom.trace.fingerprint.value(expects raw 64-hex orsha256:<hex>)custom.trace.violations(TraceViolation schema)
- Legacy flat fields:
custom.trace_fingerprintcustom.trace_violations
If none are present, trace drift and trace violations are not evaluated.
Trace drift vs. trace violations
Trace drift: for a matching record (same model/probe/example), drift is reported when both baseline and candidate have trace fingerprints and they differ. This can catch behavioural changes even if the final output text stays the same.
- Diff flag:
--fail-on-trace-drift
Trace violations increase: violations come from trace contract validation (e.g., missing tool result, invalid tool payload). Diff reports an increase when the candidate has a larger violation count than baseline for the same record.
- Diff flag:
--fail-on-trace-violations
Output fingerprinting for structured outputs
insidellms diff compares outputs in this order:
1) If _output_text exists (e.g. output_text field, or a dict output containing output_text or text), diff compares only the extracted text. 2) Otherwise diff compares structured fingerprints (short 12-hex SHA-256 over canonical JSON).
The runner stores custom.output_fingerprint for non-string outputs and the diff uses it when no ignore list is provided.
Ignoring volatile output fields
Use --output-fingerprint-ignore to drop keys before fingerprinting structured outputs:
insidellms diff .tmp/runs/base .tmp/runs/head \
--output-fingerprint-ignore timestamp,request_id
Notes:
- Keys are case-insensitive.
- The ignore list applies to any matching key at any depth (not path-based).
- When an ignore list is present, the diff recomputes fingerprints from output data (it will not use
custom.output_fingerprintunlessoutputis missing).
CI gating recommendations
- Strict determinism: emit trace bundles and use
--fail-on-trace-drift. - Contract enforcement: use
--fail-on-trace-violationsto block regressions in trace correctness. - Structured outputs with noisy fields: use
--output-fingerprint-ignore.
Example CI sequence:
insidellms harness ci/harness.yaml --run-dir .tmp/runs/base --overwrite --skip-report
insidellms harness ci/harness.yaml --run-dir .tmp/runs/head --overwrite --skip-report
insidellms diff .tmp/runs/base .tmp/runs/head \
--fail-on-changes \
--fail-on-trace-drift \
--output-fingerprint-ignore timestamp,request_id