insideLLMs — Evidence for model behaviour

Purpose

insideLLMs is built for teams who need control, not confidence

This site complements the core open-source project with rollout guidance, buyer-facing language, and governance framing. The product reality stays technical: probe, compare, decide.

What we believe

Evidence is the interface

Product value is no longer in generic confidence scores. Value is in traceable manifests, reproducible outputs, and explicit policy outcomes.

Determinism is an operating constraint

Probe inputs and run artefacts must remain comparable across releases. Non-determinism belongs in exception handling, not default release flow.

Governance is release engineering

Review logs, diff outcomes, and documented waivers are part of software delivery for regulated workflows, not post-hoc paperwork.

How we describe the platform

Core operating loop

  • insidellms quicktest for immediate prompt-level checks
  • insidellms harness for deterministic probe execution
  • insidellms report for reviewer-readable run evidence
  • insidellms diff --fail-on-changes for release gating

Primary stakeholders

  • Platform and model engineering teams
  • Risk, compliance, and internal audit
  • Procurement and delivery governance teams

Who this is not for

Not a prompt gallery

The objective is not to collect clever prompts. The objective is to measure behavioural stability over time.

Not an ungoverned chatbot demo

This site is intentionally evidence-heavy because model release decisions need artefacts, not screenshots and confidence claims.

Commitment to regulated buyers

We keep chain-of-custody fields explicit and exportable.
We treat policy outcomes as first-class evidence, not manual comments.
We preserve reproducibility so claims can be independently rechecked.

Roadmap principles

If you want a stronger fit, bring one production workflow and we can map it into probes, diffs, and release controls.