Production Shadow Capture

Capture a small, safe slice of production traffic in canonical records.jsonl format, then replay and diff it through the same deterministic harness pipeline.

Why Use Shadow Capture

  • Build regression suites from real user traffic.
  • Detect behavior drift earlier than synthetic-only testing.
  • Reuse the same run -> records -> report -> diff spine.

FastAPI Middleware Setup

from fastapi import FastAPI
from insideLLMs import shadow

app = FastAPI()
app.middleware("http")(
    shadow.fastapi(
        output_path="./shadow/records.jsonl",
        sample_rate=0.01,
        model_id="prod-gpt4o",
        model_provider="openai",
        dataset_id="prod-traffic",
        include_request_headers=False,
    )
)

Sampling Strategy

  • Start with sample_rate=0.01 (1%) and increase only after storage review.
  • Keep sampling deterministic and stable across deploys.
  • Use separate output paths per service/environment.

Privacy and Redaction Guidance

  • Default to include_request_headers=False.
  • Redact or hash sensitive request/response fields before writing.
  • Restrict access to shadow artifact directories.
  • Treat shadow records as production data for retention/compliance policies.

Replay and Diff Workflow

# 1) Generate baseline run from approved prompt/model setup
insidellms harness ci/harness.yaml --run-dir .tmp/runs/base --overwrite

# 2) Generate candidate run (new code/model/prompt)
insidellms harness ci/harness.yaml --run-dir .tmp/runs/head --overwrite

# 3) Gate behavior drift
insidellms diff .tmp/runs/base .tmp/runs/head --fail-on-changes

For agentic/tool workflows, add:

insidellms diff .tmp/runs/base .tmp/runs/head --fail-on-trajectory-drift

Operational Tips

  • Rotate records.jsonl outputs daily or by size.
  • Attach source metadata (service, env, version) in custom fields upstream.
  • Keep a reproducible baseline branch/reference for CI diffing.