insideLLMs — Evidence for model behaviour

No vibes. Only evidence.

Model behaviour that drifts silently is an operational risk.

The adoption and governance layer for the insideLLMs open-source toolchain: deterministic probes, response-level diffs, and CI gating for production confidence.

Probe-driven testing + deterministic artefacts + policy gates = controllable AI rollout.

Quick start

Use the same command flow promoted in the upstream project docs, then layer governance review on top.

Step 1

Run a fast confidence check

Start with insidellms quicktest "your prompt" to spot obvious failure modes quickly.

Step 2

Execute deterministic harness probes

Run insidellms harness probes/financial.toml --output out/ to produce reproducible run artefacts.

Step 3

Diff and gate in CI

Use insidellms diff baseline.json candidate.json --fail-on-changes to block unsafe drift before merge.

How this site complements the docs

GitHub docs site

Canonical product documentation, command references, and implementation details for engineers integrating the library and CLI.

Read official docs

This website

Positioning, rollout guidance, governance framing, and buyer-facing evidence language for platform, risk, and compliance stakeholders.

Open implementation map

Evidence chain

The chain is the product: if the chain is incomplete, the claim is unactionable.

Capture

Run manifest

Every run stores inputs · environment · model config · tools · outputs as a single artefact.

Manifest: run-id r_8f3a1d2 · hash 4b2af4c · signer kms:key-42
Mandatory trust anchor: if two teams run the same manifest and seed, outputs are comparable and reproducible.
Replay

Deterministic replay

Replay is the primitive. It is explicit, versioned, and recorded. Replay results are evaluated like any other engineering build artefact.

  • Environment lock-step
  • Tool-call snapshots
  • Stable output encoding
Diff

Behavioural diff as first-class data

Baseline vs candidate outputs become the visible boundary between intended and introduced behaviour. This is the gate input.

  • Token-level deltas
  • Semantic change score
  • Risk tags
Gate

CI enforcement

The gate consumes diff artefacts and policy rules. Failing drift fails builds before it reaches users.

insidellms diff baseline.json candidate.json --fail-on-changes

Behavioural diff viewer

Diffs are not commentary; they are evidence. This pattern scales from one regression test to large harnesses.

This is a sample where the visible regression is semantically subtle but operationally significant. The tool change path is now observable in the artefact.

Governance mapping and accountability

Run Manifest

A manifest proves what was run and why. If the manifest is not signed, the run is incomplete evidence.

Inputs
prompt_hash=ae81b · model=llm-4.2 · temp=0 · tools=[retrieval,crm_lookup]
Signed @ 2026-02-13T10:22:18Z by k8s-runner-3
  • Environment lock details (seed, image, dependency lockfiles)
  • Tool boundary policy
  • Output schema and verifier rules

Tool-augmented blast radius

Tools are part of behaviour. They need traceability as first-class events, not a side channel.

  • Tool inputs and return payload snapshots
  • External dependency versioning
  • Action class (read/write/transfer)

The risk lens re-phrases evidence only; it does not change underlying truth.

Run timeline / trace explorer

Deterministic replay becomes trustworthy when the chain of events is inspectable.

Adoption path

1. Baseline lock

Capture stable baseline runs for your critical flows and publish minimal baseline policy.

2. Enforce gates

Enable CI checks on high-risk suites and require explicit waiver flows.

3. Package evidence

Generate audit packs by environment and business domain for procurement, internal audit, and incident reviews.

Credibility strip

Chain of custody: manifest hash, signature, timestamp, and verifier key.
Audit-ready export: manifest + diff + timeline + policy decision bundle.
Determinism report: replay error budget, model/runtime versions, and tool-call inventory.

Core Process Pipeline