Notes from the control plane

Practical guidance for teams moving from "prompt tuning" to governed model release operations.

Read official docs Open CLI summary

Latest themes

No vibes. Only evidence.

Why behaviour validation needs deterministic inputs, stable artefacts, and versioned release criteria.

Confidence scores are not release controls
Probe suites outperform ad-hoc smoke testing
Run evidence must survive external review

Jump to summary

Behavioural diffs as evidence bundles

How baseline-versus-candidate comparisons reduce ambiguous regressions and speed reviewer decisions.

Make output deltas visible and inspectable
Map each diff to a policy decision
Use non-zero exits to enforce intent

Jump to summary

Tool call boundaries and blast radius

The most expensive failures usually come from action surfaces, not wording. Tool traces need first-class visibility.

Track read/write/transfer action classes
Treat new tool integrations as risk events
Review permission drift before deployment

Jump to summary

Summaries

No vibes. Only evidence.

Release quality for LLM systems is measured by repeatability. If teams cannot rerun the same probes and compare output changes deterministically, they cannot make defensible release decisions.

Behavioural diffs as evidence bundles

Diffs should be stored as artefacts with decision context. They are not diagnostics alone; they are the review object used to approve, block, or waive changes.

Tool call boundaries and blast radius

Any expansion in callable tools should be treated as a material behavioural change. Governance requires visibility into tool permissions, inputs, and side effects.