Configuration Reference

Complete reference for YAML/JSON configuration files.

Which Config API to Use

insideLLMs has two configuration surfaces; use the one that matches your workflow:

Use case Module Primary types When to use
YAML/JSON config files (run, harness) insideLLMs.config ExperimentConfig, ModelConfig, ProbeConfig, DatasetConfig, RunnerConfig Loading from files, load_config(), save_config_to_yaml()
Programmatic runner control insideLLMs.config_types RunConfig, RunConfigBuilder, ProgressInfo Passing options to ProbeRunner.run() / AsyncProbeRunner.run()
  • Config filesinsideLLMs.config (Pydantic models for YAML/JSON).
  • Runner kwargsinsideLLMs.config_types.RunConfig (dataclass-style runtime controls).

The CLI (insidellms run, insidellms harness) loads YAML via config and internally builds RunConfig-compatible settings.

Config Types

Type Command Purpose
Run Config insidellms run Single model/probe execution
Harness Config insidellms harness Multi-model comparison

Run Config

For insidellms run:

# model: The model to use
model:
  type: openai           # Model type (required)
  args:                  # Model constructor arguments
    model_name: gpt-4o
    temperature: 0.7

# probe: The probe to run
probe:
  type: logic            # Probe type (required)
  args: {}               # Probe constructor arguments

# dataset: Input data
dataset:
  format: jsonl          # Format: jsonl, csv, hf
  path: data/test.jsonl  # Path to dataset file

# Optional settings
generation:              # Passed to probe/model generate call
  temperature: 0.7
  max_tokens: 800

For execution controls (validation/resume/overwrite/async), use CLI flags:

insidellms run config.yaml --async --concurrency 10
insidellms run config.yaml --validate-output --validation-mode warn
insidellms run config.yaml --resume
insidellms run config.yaml --overwrite

Minimal Example

model:
  type: dummy

probe:
  type: logic

dataset:
  format: jsonl
  path: data/test.jsonl

Harness Config

For insidellms harness:

# models: List of models to compare
models:
  - type: openai
    args:
      model_name: gpt-4o
  - type: anthropic
    args:
      model_name: claude-3-5-sonnet-20241022

# probes: List of probes to run
probes:
  - type: logic
  - type: factuality
  - type: bias

# dataset: Shared dataset
dataset:
  format: jsonl
  path: data/test.jsonl

# Output settings
output_dir: ./comparison_results

# Optional settings
max_examples: 50
confidence_level: 0.95

Dataset Formats

JSONL

dataset:
  format: jsonl
  path: data/test.jsonl

File format:

{"question": "What is 2 + 2?", "expected": "4"}
{"question": "What colour is the sky?", "expected": "blue"}

CSV

dataset:
  format: csv
  path: data/test.csv
  columns:
    question: prompt_column
    expected: answer_column

HuggingFace

dataset:
  format: hf
  name: cais/mmlu
  split: test

max_examples: 100

Model Configuration

Common Options

model:
  type: openai           # Required: model type
  args:
    model_name: gpt-4o   # Model identifier
    temperature: 0.7     # Sampling temperature (0.0-2.0)
    max_tokens: 1000     # Max response tokens
    timeout: 60          # Request timeout in seconds

Provider-Specific

OpenAI

model:
  type: openai
  args:
    model_name: gpt-4o
    temperature: 0.7
    max_tokens: 1000
    top_p: 1.0
    frequency_penalty: 0.0
    presence_penalty: 0.0

Anthropic

model:
  type: anthropic
  args:
    model_name: claude-3-5-sonnet-20241022
    max_tokens: 1000
    temperature: 0.7

Ollama

model:
  type: ollama
  args:
    model_name: llama3
    base_url: http://localhost:11434

DummyModel

model:
  type: dummy
  args:
    name: test_model
    canned_response: "Fixed test response"

Probe Configuration

Basic

probe:
  type: logic
  args: {}

With Options

probe:
  type: logic
  args:
    strict: true
    timeout: 30

Multiple Probes (Harness)

probes:
  - type: logic
  - type: bias
    args:
      sensitivity: high
  - type: factuality

Path Resolution

Relative paths are resolved relative to the config file’s directory, not the current working directory.

# If config is at /project/configs/harness.yaml
dataset:
  path: ../data/test.jsonl  # Resolves to /project/data/test.jsonl

Environment Variables

Reference environment variables in configs:

model:
  type: openai
  args:
    api_key: ${OPENAI_API_KEY}  # Expanded at runtime

Execution Options

# Limit dataset
max_examples: 100

# Optional generation kwargs passed through to probes/models
generation:
  temperature: 0.3
  max_tokens: 500

Execution controls are CLI flags:

insidellms run config.yaml --async --concurrency 10
insidellms run config.yaml --validate-output --schema-version 1.0.0
insidellms run config.yaml --resume
insidellms run config.yaml --overwrite

Complete Examples

Minimal Run Config

model:
  type: dummy
probe:
  type: logic
dataset:
  format: jsonl
  path: data/test.jsonl

Production Harness

models:
  - type: openai
    args:
      model_name: gpt-4o
      temperature: 0.3
  - type: anthropic
    args:
      model_name: claude-3-5-sonnet-20241022
      temperature: 0.3

probes:
  - type: logic
  - type: factuality
  - type: bias
  - type: instruction_following

dataset:
  format: jsonl
  path: data/evaluation_set.jsonl

output_dir: ./evaluation_results
max_examples: 500
confidence_level: 0.95

CI Baseline Config

models:
  - type: dummy
    args:
      name: baseline

probes:
  - type: logic

dataset:
  format: jsonl
  path: ci/test_data.jsonl

output_dir: ci/baseline

Validation

Validate your config before running:

# Check config syntax
python -c "import yaml; yaml.safe_load(open('config.yaml'))"

# Validate config against schema contracts
insidellms validate config.yaml

See Also