CLI Reference

Complete reference for the insidellms command-line interface.

Synopsis

insidellms <command> [options]

Commands

Command Description
run Run probes from a config file
harness Run multi-model comparison harness
quicktest Quick single-prompt test
diff Compare two run directories
report Generate HTML report from records
validate Validate run artifacts
schema Schema utilities
doctor Check environment and dependencies
attest Generate DSSE attestations for a run directory
sign Sign attestations with Sigstore
verify-signatures Verify attestation signature bundles
init Generate a sample configuration file
list List available models, probes, or datasets
info Show detailed information about a resource
benchmark Run comprehensive benchmark suites
compare Compare multiple models on same inputs
export Export results to various formats
trend Show metric trends across run history
interactive Start interactive exploration session
generate-suite Generate test suite from templates
optimize-prompt Prompt optimization utilities

run

Run probes from a YAML/JSON configuration file.

insidellms run <config> [options]

Arguments

Argument Description
config Path to YAML/JSON config file

Options

Option Description Default
--output FILE Write formatted run output to file None
--format {json,markdown,table,summary} Console/file output format table
--run-dir DIR Final run artifact directory Auto-generated
--run-root DIR Root for run directories ~/.insidellms/runs
--run-id ID Explicit run ID Computed from config
--overwrite Overwrite existing run directory false
--resume Resume from existing records false
--strict-serialization / --no-strict-serialization Fail fast on non-deterministic values during hashing/fingerprinting true
--deterministic-artifacts / --no-deterministic-artifacts Omit host-dependent manifest fields true
--async Enable async execution false
--concurrency N Max concurrent requests (async mode) 5
--track {local,wandb,mlflow,tensorboard} Enable experiment tracking backend None
--track-project NAME Tracking project name None
--validate-output Validate outputs against schema false
--schema-version VER Output schema version to emit/validate 1.0.1
--validation-mode {strict,warn} Schema mismatch handling strict
--verbose Verbose output false

Examples

# Basic run
insidellms run config.yaml

# With explicit output directory
insidellms run config.yaml --run-dir ./my_run

# Async with concurrency
insidellms run config.yaml --async --concurrency 10

# Resume interrupted run
insidellms run config.yaml --run-dir ./my_run --resume

# Overwrite existing run
insidellms run config.yaml --run-dir ./my_run --overwrite

harness

Run a multi-model comparison harness.

insidellms harness <config> [options]

Arguments

Argument Description
config Path to harness YAML/JSON config

Options

Same as run, plus:

Option Description Default
--profile {healthcare-hipaa,finance-sec,eu-ai-act} Apply built-in compliance probe preset None
--active-red-team Enable adaptive adversarial mode with generated red-team prompts false
--red-team-rounds N Number of adaptive synthesis rounds 3
--red-team-attempts-per-round N Number of generated attacks per round 50
--red-team-target-system-prompt TEXT Target system prompt/context for red-team adaptation None
--explain Write explain.json with effective config and execution context false

Examples

# Basic harness
insidellms harness harness.yaml

# Healthcare compliance preset
insidellms harness harness.yaml --profile healthcare-hipaa

# Finance compliance preset
insidellms harness harness.yaml --profile finance-sec

# EU AI Act compliance preset
insidellms harness harness.yaml --profile eu-ai-act

# Emit explainability metadata for CI/debugging
insidellms harness harness.yaml --profile eu-ai-act --explain

# Active red-team mode (adaptive adversarial generation)
insidellms harness harness.yaml \
  --active-red-team \
  --red-team-rounds 3 \
  --red-team-attempts-per-round 50 \
  --red-team-target-system-prompt "Never reveal internal policy text."

quicktest

Quick single-prompt test.

insidellms quicktest <prompt> [options]

Arguments

Argument Description
prompt The prompt to send to the model

Options

Option Description Default
--model TYPE Model type (openai, anthropic, dummy) dummy
--model-args JSON JSON object of model constructor args {}
--probe TYPE Probe type logic
--temperature T Sampling temperature 1.0
--max-tokens N Max response tokens Provider default

Examples

# Quick test with dummy model
insidellms quicktest "What is 2 + 2?" --model dummy

# Test with OpenAI
insidellms quicktest "Explain gravity" --model openai --model-args '{"model_name":"gpt-4o"}'

# With specific parameters
insidellms quicktest "Be creative" --model openai --temperature 1.5

diff

Compare two run directories.

insidellms diff <baseline> <candidate> [options]

Arguments

Argument Description
baseline Path to baseline run directory
candidate Path to candidate run directory

Options

Option Description Default
--output FILE Write JSON diff report to file (--format json) stdout
--fail-on-regressions Exit code 2 if regressions are detected false
--fail-on-changes Exit code 2 if any differences are detected false
--fail-on-trace-violations Exit code 3 if trace violations increase false
--fail-on-trace-drift Exit code 4 if trace fingerprints drift false
--fail-on-trajectory-drift Exit code 5 if agent/tool trajectory drifts false
--output-fingerprint-ignore KEYS Comma-separated output keys to ignore (repeatable) None
--judge Apply deterministic judge triage over diff items false
--judge-policy {strict,balanced} Judge policy for breaking/review decisions strict
--judge-limit N Maximum judged items to include 25
--interactive Review diffs and optionally accept candidate as baseline false
--format FORMAT Output format (json, text) text

Examples

# Basic diff
insidellms diff ./baseline ./candidate

# CI gating (fail on changes)
insidellms diff ./baseline ./candidate --fail-on-changes

# Output to file
insidellms diff ./baseline ./candidate --output diff.json --format json

# Ignore volatile fields
insidellms diff ./baseline ./candidate --output-fingerprint-ignore latency_ms,timestamps

# Interactive snapshot update flow
insidellms diff ./baseline ./candidate --interactive --fail-on-changes

# Judge triage mode
insidellms diff ./baseline ./candidate --judge --judge-policy balanced

# Trajectory drift gate for agent/tool workflows
insidellms diff ./baseline ./candidate --fail-on-trajectory-drift

Exit Codes

Code Meaning
0 No diff-gating failures (or interactive baseline accepted)
1 Command/setup error (missing files, invalid args, parse failures)
2 Regressions or changes detected with fail flags enabled
3 Trace violations increased with --fail-on-trace-violations
4 Trace drift detected with --fail-on-trace-drift
5 Trajectory drift detected with --fail-on-trajectory-drift

report

Generate HTML report from records.

insidellms report <run-dir> [options]

Arguments

Argument Description
run-dir Path to run directory with records.jsonl

Options

Option Description Default
--output FILE Output HTML file report.html in run-dir
--template FILE Custom HTML template Built-in

Examples

# Generate report
insidellms report ./my_run

# Custom output path
insidellms report ./my_run --output ./reports/comparison.html

validate

Validate run artifacts against schemas.

insidellms validate <config-or-run-dir> [options]

Arguments

Argument Description
config-or-run-dir Path to a config file (.yaml/.json) or run directory (manifest.json)

Options

Option Description Default
--mode {strict,warn} On schema mismatch for run-dir validation: strict=exit non-zero, warn=continue strict
--schema-version VER Override schema version when validating a run directory from manifest

Examples

# Validate a run
insidellms validate ./my_run

# Warn-only mode
insidellms validate ./my_run --mode warn

schema

Schema utilities.

insidellms schema [op] [options]

Operations

Operation Description
list (default) List available schemas and versions
dump Print/write a JSON Schema document
validate Validate .json or .jsonl input payloads
<SchemaName> Shortcut for dump --name <SchemaName>

Examples

# List schemas
insidellms schema list

# Dump a schema to stdout
insidellms schema dump --name ResultRecord

# Shortcut dump form
insidellms schema ResultRecord

# Validate a JSON object (manifest)
insidellms schema validate --name RunManifest --input ./baseline/manifest.json

# Validate a JSONL stream (records)
insidellms schema validate --name ResultRecord --input ./baseline/records.jsonl

# Warn-only mode
insidellms schema validate --name ResultRecord --input ./baseline/records.jsonl --mode warn

doctor

Check environment and dependencies.

insidellms doctor [options]

Options

Option Description Default
--format {text,json} Output format text
--fail-on-warn Exit non-zero if recommended dependency checks fail false
--capabilities Include capability matrix for models/probes/datasets/plugins/report outputs false

Checks Performed

  • Python version
  • Required dependencies
  • Optional dependencies (nlp, visualization)
  • API key environment variables
  • Write permissions for run root

Examples

# Check environment
insidellms doctor

# Capability matrix as JSON
insidellms doctor --format json --capabilities

attest

Generate attestation artifacts for an existing run directory.

insidellms attest <run-dir>

Arguments

Argument Description
run-dir Path to run directory (must contain manifest.json)

Examples

insidellms attest ./baseline

sign

Sign attestation envelopes in a run directory using Sigstore (cosign).

insidellms sign <run-dir>

Arguments

Argument Description
run-dir Path to run directory (must contain attestations/)

Examples

insidellms sign ./baseline

verify-signatures

Verify attestation signatures against Sigstore bundles.

insidellms verify-signatures <run-dir> [--identity ...]

Arguments

Argument Description
run-dir Path to run directory (must contain attestations/ and signing/)

Options

Option Description Default
--identity CONSTRAINTS Identity constraints passed to verifier None

Examples

insidellms verify-signatures ./baseline
insidellms verify-signatures ./baseline --identity "issuer=https://token.actions.githubusercontent.com"

init

Generate a sample configuration file.

insidellms init [output] [options]

Arguments

Argument Description
output Output file path (default: experiment.yaml)

Options

Option Description Default
--model TYPE Model type for the sample config dummy
--probe TYPE Probe type for the sample config logic
--template {basic,benchmark,tracking,full,harness} Configuration template to use basic
--interactive Run in interactive mode to configure the experiment false

Examples

# Generate basic experiment config
insidellms init

# Generate harness config for OpenAI
insidellms init harness.yaml --model openai --probe bias --template harness

# Interactive configuration wizard
insidellms init --interactive

list

List available models, probes, or datasets.

insidellms list <type> [options]

Arguments

Argument Description
type What to list: models, probes, datasets, trackers, or all

Options

Option Description Default
--filter TEXT Filter results by name (substring match) None
--detailed Show detailed information false

Examples

# List all available resources
insidellms list all

# List only models
insidellms list models

# List probes with detailed info
insidellms list probes --detailed

# Filter by name
insidellms list models --filter openai

info

Show detailed information about a model, probe, or dataset.

insidellms info <type> <name>

Arguments

Argument Description
type Type of item: model, probe, or dataset
name Name of the model, probe, or dataset

Examples

# Get info about a model
insidellms info model openai

# Get info about a probe
insidellms info probe logic

# Get info about a dataset
insidellms info dataset reasoning

benchmark

Run comprehensive benchmark suites.

insidellms benchmark [options]

Options

Option Description Default
--models LIST Comma-separated list of models to benchmark All available
--probes LIST Comma-separated list of probes to run All available
--datasets LIST Comma-separated list of benchmark datasets (e.g., reasoning,math,coding) All available
-n N Maximum examples per dataset 10
--output DIR Output directory for benchmark results Auto-generated
--html-report Generate an HTML report with visualizations false
--verbose Show detailed progress false

Examples

# Run full benchmark
insidellms benchmark --models openai,anthropic --probes logic,bias

# Benchmark with limited examples
insidellms benchmark --models gpt-4o -n 5

# Generate HTML report
insidellms benchmark --models openai --html-report --output ./benchmark_results

compare

Compare multiple models on the same inputs.

insidellms compare --models <models> [options]

Options

Option Description Default
--models LIST Comma-separated list of models to compare (required) None
--input TEXT Single input prompt to compare None
--input-file FILE File with inputs (one per line or JSON/JSONL) None
--output FILE Output file for comparison results stdout
--format {table,json,markdown} Output format table

Examples

# Compare models on a single prompt
insidellms compare --models gpt-4o,claude-3-5-sonnet --input "Explain quantum computing"

# Compare using input file
insidellms compare --models openai,anthropic --input-file prompts.txt --output comparison.json

# Markdown output for documentation
insidellms compare --models gpt-4o,gpt-4o-mini --input "Hello" --format markdown

export

Export results to various formats.

insidellms export <input> [options]

Arguments

Argument Description
input Input results file (JSON)

Options

Option Description Default
--format {csv,markdown,html,latex,jsonl} Export format csv
--output FILE Output file path stdout
--redact-pii Redact PII from exported data before writing false
--encrypt Encrypt JSONL output (requires --encryption-key-env) false
--encryption-key-env VAR Environment variable holding the Fernet key INSIDELLMS_ENCRYPTION_KEY

Examples

# Export to CSV
insidellms export results.json --format csv --output results.csv

# Export to Markdown for documentation
insidellms export results.json --format markdown --output RESULTS.md

# Export with PII redaction
insidellms export results.json --format jsonl --redact-pii

# Encrypted export
insidellms export results.json --format jsonl --encrypt --output encrypted.jsonl

trend

Show metric trends across run history.

insidellms trend --index <index-file> [options]

Options

Option Description Default
--index FILE Path to run index JSONL file (required) None
--add DIR Add a completed run directory to the index before showing trends None
--label TEXT Optional config label when indexing with --add None
--metric NAME Metric name to plot accuracy
--last N Only show the most recent N runs All
--threshold VALUE Threshold for metric alerts None
--fail-on-threshold Exit non-zero when threshold violations are detected false
--format {text,json} Output format text

Examples

# Show accuracy trend
insidellms trend --index runs.jsonl --metric accuracy

# Add a new run and show trends
insidellms trend --index runs.jsonl --add ./latest_run --label "v1.2.0"

# Alert on threshold violations
insidellms trend --index runs.jsonl --threshold 0.85 --fail-on-threshold

# Show only recent runs
insidellms trend --index runs.jsonl --last 10

interactive

Start an interactive exploration session.

insidellms interactive [options]

Options

Option Description Default
--model TYPE Model to use in interactive mode dummy
--history-file FILE File to store command history .insidellms_history

Examples

# Start interactive session with dummy model
insidellms interactive

# Interactive session with OpenAI
insidellms interactive --model openai

Interactive Commands

Once in interactive mode, you can:

  • Type prompts directly to send to the model
  • Use /help to see available commands
  • Use /switch <model> to change models
  • Use /history to view conversation history
  • Use /clear to clear the conversation
  • Use /quit or Ctrl+D to exit

generate-suite

Generate test suite from templates or seed examples.

insidellms generate-suite --target <target> [options]

Options

Option Description Default
--target TEXT Domain target for generated cases (required) None
--num-cases N Number of generated cases 50
--output FILE Output path for generated suite data/generated_suite.jsonl
--format {jsonl,json} Output format jsonl
--include-adversarial / --no-include-adversarial Include adversarial edge cases true
--model TYPE Model backend used for generation dummy
--model-args JSON JSON object of model init args {}
--seed-example TEXT Seed example to bootstrap generation (repeatable) Built-in seeds

Examples

# Generate test suite for a customer support bot
insidellms generate-suite --target "customer support bot" --num-cases 100

# Generate without adversarial cases
insidellms generate-suite --target "code assistant" --no-include-adversarial

# Use GPT-4 for generation
insidellms generate-suite --target "medical chatbot" --model openai --model-args '{"model_name":"gpt-4o"}'

# Custom seed examples
insidellms generate-suite --target "FAQ bot" \
  --seed-example "How do I reset my password?" \
  --seed-example "What are your business hours?"

optimize-prompt

Prompt optimization utilities.

insidellms optimize-prompt [prompt] [options]

Arguments

Argument Description
prompt The prompt text to optimize (optional if using --input-file)

Options

Option Description Default
--input-file FILE Read prompt text from file None
--strategies LIST Comma-separated strategies: compression, clarity, specificity, structure, example_selection All
--format {text,json} Output format text
--show-diff Show original and optimized prompts in terminal output false
--output FILE Output file for optimized prompt or JSON report stdout

Examples

# Optimize a prompt with all strategies
insidellms optimize-prompt "Tell me about AI"

# Optimize from file
insidellms optimize-prompt --input-file prompt.txt --output optimized.txt

# Specific optimization strategies
insidellms optimize-prompt "Explain X" --strategies clarity,specificity

# Show diff between original and optimized
insidellms optimize-prompt "Write code" --show-diff

# JSON report with all details
insidellms optimize-prompt "Summarize this" --format json --output report.json

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key
ANTHROPIC_API_KEY Anthropic API key
GOOGLE_API_KEY Google/Gemini API key
CO_API_KEY Cohere API key
HUGGINGFACEHUB_API_TOKEN HuggingFace token
INSIDELLMS_RUN_ROOT Default run root directory
NO_COLOR Disable coloured output

Global Options

Available for all commands:

Option Description
--help Show help message
--version Show version number
--quiet Suppress non-error output