# Basic run
insidellms run config.yaml
# With explicit output directory
insidellms run config.yaml --run-dir ./my_run
# Async with concurrency
insidellms run config.yaml --async--concurrency 10
# Resume interrupted run
insidellms run config.yaml --run-dir ./my_run --resume# Overwrite existing run
insidellms run config.yaml --run-dir ./my_run --overwrite
# Quick test with dummy model
insidellms quicktest "What is 2 + 2?"--model dummy
# Test with OpenAI
insidellms quicktest "Explain gravity"--model openai --model-args'{"model_name":"gpt-4o"}'# With specific parameters
insidellms quicktest "Be creative"--model openai --temperature 1.5
diff
Compare two run directories.
insidellms diff <baseline> <candidate> [options]
Arguments
Argument
Description
baseline
Path to baseline run directory
candidate
Path to candidate run directory
Options
Option
Description
Default
--output FILE
Write JSON diff report to file (--format json)
stdout
--fail-on-regressions
Exit code 2 if regressions are detected
false
--fail-on-changes
Exit code 2 if any differences are detected
false
--fail-on-trace-violations
Exit code 3 if trace violations increase
false
--fail-on-trace-drift
Exit code 4 if trace fingerprints drift
false
--fail-on-trajectory-drift
Exit code 5 if agent/tool trajectory drifts
false
--output-fingerprint-ignore KEYS
Comma-separated output keys to ignore (repeatable)
None
--judge
Apply deterministic judge triage over diff items
false
--judge-policy {strict,balanced}
Judge policy for breaking/review decisions
strict
--judge-limit N
Maximum judged items to include
25
--interactive
Review diffs and optionally accept candidate as baseline
What to list: models, probes, datasets, trackers, or all
Options
Option
Description
Default
--filter TEXT
Filter results by name (substring match)
None
--detailed
Show detailed information
false
Examples
# List all available resources
insidellms list all
# List only models
insidellms list models
# List probes with detailed info
insidellms list probes --detailed# Filter by name
insidellms list models --filter openai
info
Show detailed information about a model, probe, or dataset.
insidellms info <type> <name>
Arguments
Argument
Description
type
Type of item: model, probe, or dataset
name
Name of the model, probe, or dataset
Examples
# Get info about a model
insidellms info model openai
# Get info about a probe
insidellms info probe logic
# Get info about a dataset
insidellms info dataset reasoning
benchmark
Run comprehensive benchmark suites.
insidellms benchmark [options]
Options
Option
Description
Default
--models LIST
Comma-separated list of models to benchmark
All available
--probes LIST
Comma-separated list of probes to run
All available
--datasets LIST
Comma-separated list of benchmark datasets (e.g., reasoning,math,coding)
All available
-n N
Maximum examples per dataset
10
--output DIR
Output directory for benchmark results
Auto-generated
--html-report
Generate an HTML report with visualizations
false
--verbose
Show detailed progress
false
Examples
# Run full benchmark
insidellms benchmark --models openai,anthropic --probes logic,bias
# Benchmark with limited examples
insidellms benchmark --models gpt-4o -n 5
# Generate HTML report
insidellms benchmark --models openai --html-report--output ./benchmark_results
compare
Compare multiple models on the same inputs.
insidellms compare --models <models> [options]
Options
Option
Description
Default
--models LIST
Comma-separated list of models to compare (required)
None
--input TEXT
Single input prompt to compare
None
--input-file FILE
File with inputs (one per line or JSON/JSONL)
None
--output FILE
Output file for comparison results
stdout
--format {table,json,markdown}
Output format
table
Examples
# Compare models on a single prompt
insidellms compare --models gpt-4o,claude-3-5-sonnet --input"Explain quantum computing"# Compare using input file
insidellms compare --models openai,anthropic --input-file prompts.txt --output comparison.json
# Markdown output for documentation
insidellms compare --models gpt-4o,gpt-4o-mini --input"Hello"--format markdown
# Generate test suite for a customer support bot
insidellms generate-suite --target"customer support bot"--num-cases 100
# Generate without adversarial cases
insidellms generate-suite --target"code assistant"--no-include-adversarial# Use GPT-4 for generation
insidellms generate-suite --target"medical chatbot"--model openai --model-args'{"model_name":"gpt-4o"}'# Custom seed examples
insidellms generate-suite --target"FAQ bot"\--seed-example"How do I reset my password?"\--seed-example"What are your business hours?"
optimize-prompt
Prompt optimization utilities.
insidellms optimize-prompt [prompt] [options]
Arguments
Argument
Description
prompt
The prompt text to optimize (optional if using --input-file)
Show original and optimized prompts in terminal output
false
--output FILE
Output file for optimized prompt or JSON report
stdout
Examples
# Optimize a prompt with all strategies
insidellms optimize-prompt "Tell me about AI"# Optimize from file
insidellms optimize-prompt --input-file prompt.txt --output optimized.txt
# Specific optimization strategies
insidellms optimize-prompt "Explain X"--strategies clarity,specificity
# Show diff between original and optimized
insidellms optimize-prompt "Write code"--show-diff# JSON report with all details
insidellms optimize-prompt "Summarize this"--format json --output report.json