How insideLLMs works
The core loop is simple: probe, compare, decide. The implementation is deterministic so your decisions are reproducible.
The core model: deterministic probes
The harness captures prompt input, runtime context, and model output for each probe. With fixed probe definitions and versions, your baseline and candidate runs remain comparable across releases.
- Quick confidence check: run insidellms quicktest against critical prompts.
- Probe harness execution: run deterministic probe suites for baseline and candidate builds.
- Response-level comparison: generate a structured diff with risk-relevant annotations.
- CI gate: fail builds when unexpected behavioural changes are detected.
The evidence object model
Run Manifest
Structured and immutable record of every parameter used during a run.
{
"run_id": "r_8f3a1d2",
"input_bundle": "prompt.v1",
"environment": "runner.image@sha256:9d3",
"model": "llm-4.2@b7b3",
"tools": ["retrieval", "crm_lookup"],
"output": "out_v2.json",
"manifest_hash": "sha256:4b2af4c",
"signature": "sig_ed25519_aa12"
}
Diff Artefact
The diff does not decorate output; it creates the review event.
- Token and semantic deltas
- Tool side effects and permissions
- Policy severity and reason code
- Reviewer attribution for exception decisions
CI as a control surface
A CI gate links each merge request to measured behaviour. This is where deterministic testing shifts from optional validation to release policy.
insidellms harness probes/financial.toml --output out/baseline
insidellms harness probes/financial.toml --output out/candidate
insidellms diff baseline.json candidate.json --fail-on-changes
Where to go deeper
Official docs
Use the GitHub Pages docs for canonical setup steps, golden path guidance, and command details.
Open docs site
Repository
Use the repo for source, examples, probe assets, and release notes.
Open repository
What changes for your team
Engineers
Less guessing on model updates. You get concrete, reproducible regressions tied to a patch and a run ID.
Security
Tool calls and outputs are linked, with audit references on every run.
Compliance
Review packets are repeatable and packageable for internal audit and procurement due diligence.