insideLLMs — Evidence for model behaviour

Documentation map

Use this page as a bridge: canonical technical docs live in the upstream repository; this site adds positioning and governance overlays for rollout teams.

Start with canonical sources

Website design system starter

Tokens

Colour, spacing, focus, and typographic rules are defined in assets/style.css. Keep these as source of truth for all pages.

  • Typography: UI sans + code mono family
  • Spacing scale: 4, 8, 12, 16, 24, 32, 48, 64
  • Radii: 4, 8, 12, 16

Components

  • Buttons: primary, ghost, danger, disabled
  • Badges: pass, warn, fail
  • Callouts, code blocks, tables, timeline, diff panel, evidence stamp
  • Interactive controls: lens switch, theme switch, diff mode

Getting started workflow

  1. Run insidellms quicktest on high-risk prompts for immediate confidence signals.
  2. Execute insidellms harness with a stable probe suite for baseline and candidate outputs.
  3. Generate a behavioural comparison with insidellms diff.
  4. Fail CI on unacceptable changes using --fail-on-changes.

Evidence topics on this site

Run manifest schema

The manifest captures all data required to recreate a run: input hash, runtime stack, model revision, tools, signature, and timestamps.

Diff rules

Diff outputs are split into context, insert, and delete layers with policy tags. Each decision includes drift scores and risk class.

Audit packs

Build signed bundles containing manifest, diff, timeline, and gate log so every release decision is independently verifiable.

Example command flow

Repo-aligned workflow

insidellms quicktest "Can a force equal mass times acceleration?"
insidellms harness probes/financial.toml --output out/baseline
insidellms harness probes/financial.toml --output out/candidate
insidellms report out/candidate/manifest.json --format markdown
insidellms diff baseline.json candidate.json --fail-on-changes