Run a fast confidence check
Start with insidellms quicktest "your prompt" to spot obvious failure modes quickly.
No vibes. Only evidence.
The adoption and governance layer for the insideLLMs open-source toolchain: deterministic probes, response-level diffs, and CI gating for production confidence.
Probe-driven testing + deterministic artefacts + policy gates = controllable AI rollout.
Use the same command flow promoted in the upstream project docs, then layer governance review on top.
Start with insidellms quicktest "your prompt" to spot obvious failure modes quickly.
Run insidellms harness probes/financial.toml --output out/ to produce reproducible run artefacts.
Use insidellms diff baseline.json candidate.json --fail-on-changes to block unsafe drift before merge.
Canonical product documentation, command references, and implementation details for engineers integrating the library and CLI.
Read official docsPositioning, rollout guidance, governance framing, and buyer-facing evidence language for platform, risk, and compliance stakeholders.
Open implementation mapThe chain is the product: if the chain is incomplete, the claim is unactionable.
Every run stores inputs · environment · model config · tools · outputs as a single artefact.
Replay is the primitive. It is explicit, versioned, and recorded. Replay results are evaluated like any other engineering build artefact.
Baseline vs candidate outputs become the visible boundary between intended and introduced behaviour. This is the gate input.
The gate consumes diff artefacts and policy rules. Failing drift fails builds before it reaches users.
insidellms diff baseline.json candidate.json --fail-on-changes
Diffs are not commentary; they are evidence. This pattern scales from one regression test to large harnesses.
This is a sample where the visible regression is semantically subtle but operationally significant. The tool change path is now observable in the artefact.
A manifest proves what was run and why. If the manifest is not signed, the run is incomplete evidence.
Tools are part of behaviour. They need traceability as first-class events, not a side channel.
The risk lens re-phrases evidence only; it does not change underlying truth.
Deterministic replay becomes trustworthy when the chain of events is inspectable.
Capture stable baseline runs for your critical flows and publish minimal baseline policy.
Enable CI checks on high-risk suites and require explicit waiver flows.
Generate audit packs by environment and business domain for procurement, internal audit, and incident reviews.