Production Shadow Capture
Capture a small, safe slice of production traffic in canonical records.jsonl format, then replay and diff it through the same deterministic harness pipeline.
Why Use Shadow Capture
- Build regression suites from real user traffic.
- Detect behavior drift earlier than synthetic-only testing.
- Reuse the same
run -> records -> report -> diffspine.
FastAPI Middleware Setup
from fastapi import FastAPI
from insideLLMs import shadow
app = FastAPI()
app.middleware("http")(
shadow.fastapi(
output_path="./shadow/records.jsonl",
sample_rate=0.01,
model_id="prod-gpt4o",
model_provider="openai",
dataset_id="prod-traffic",
include_request_headers=False,
)
)
Sampling Strategy
- Start with
sample_rate=0.01(1%) and increase only after storage review. - Keep sampling deterministic and stable across deploys.
- Use separate output paths per service/environment.
Privacy and Redaction Guidance
- Default to
include_request_headers=False. - Redact or hash sensitive request/response fields before writing.
- Restrict access to shadow artifact directories.
- Treat shadow records as production data for retention/compliance policies.
Replay and Diff Workflow
# 1) Generate baseline run from approved prompt/model setup
insidellms harness ci/harness.yaml --run-dir .tmp/runs/base --overwrite
# 2) Generate candidate run (new code/model/prompt)
insidellms harness ci/harness.yaml --run-dir .tmp/runs/head --overwrite
# 3) Gate behavior drift
insidellms diff .tmp/runs/base .tmp/runs/head --fail-on-changes
For agentic/tool workflows, add:
insidellms diff .tmp/runs/base .tmp/runs/head --fail-on-trajectory-drift
Operational Tips
- Rotate
records.jsonloutputs daily or by size. - Attach source metadata (
service,env,version) incustomfields upstream. - Keep a reproducible baseline branch/reference for CI diffing.