Probes

Focused behavioural tests for specific capabilities.

A probe takes a model and input, runs a test, returns output (and optional score).

The Probe Interface

class Probe:
    name: str                    # Unique identifier
    category: ProbeCategory      # LOGIC, BIAS, SAFETY, etc.

    def run(self, model, data, **kwargs) -> Any:
        """Execute the probe on a single input."""

    def run_batch(self, model, dataset, **kwargs) -> list:
        """Execute on multiple inputs."""

    def score(self, results) -> ProbeScore:
        """Aggregate results into a score."""

Probe Categories

Category Tests For Example Probes
LOGIC Reasoning, deduction LogicProbe
BIAS Demographic fairness BiasProbe
SAFETY Security, jailbreaks AttackProbe, JailbreakProbe
FACTUALITY Factual accuracy FactualityProbe
CODE Programming tasks CodeGenerationProbe
INSTRUCTION Following instructions InstructionFollowingProbe
CUSTOM User-defined Your probes

How Probes Work

Simple Example

from insideLLMs.probes import LogicProbe

probe = LogicProbe()
result = probe.run(model, {"question": "What is 2 + 2?"})
# result = "4" (model's response)

With Runner

from insideLLMs.runtime.runner import ProbeRunner

runner = ProbeRunner(model, probe)
results = runner.run([
    {"question": "What is 2 + 2?"},
    {"question": "What is 3 + 3?"},
])

Batch Execution

results = probe.run_batch(
    model,
    dataset,
    max_workers=4,
    progress_callback=lambda c, t: print(f"{c}/{t}")
)

Input Formats

Probes accept various input formats:

Dict with Fields

{"question": "...", "expected": "..."}
{"prompt": "...", "constraints": [...]}
{"code": "...", "bug_type": "syntax"}

String (Simple)

"What is the capital of France?"

Messages (Chat)

{
    "messages": [
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ]
}

Scoring

Probes can evaluate results:

class MyScoredProbe(ScoredProbe):
    def evaluate_single(self, output, reference, input_data):
        is_correct = reference.lower() in output.lower()
        return {"correct": is_correct, "confidence": 0.9}

    def score(self, results):
        correct = sum(1 for r in results if r.output.get("correct"))
        return ProbeScore(value=correct / len(results))

Probe Lifecycle

sequenceDiagram
    participant R as Runner
    participant P as Probe
    participant M as Model

    loop For each input
        R->>P: run(model, input)
        P->>P: Format prompt
        P->>M: generate(prompt)
        M-->>P: response
        P->>P: Process output
        P-->>R: result
    end
    R->>P: score(results)
    P-->>R: ProbeScore

Built-in vs Custom

Use Built-in When:

  • Testing standard capabilities (logic, bias, safety)
  • Need proven, well-tested implementations
  • Want consistent results across projects

Use Custom When:

  • Domain-specific evaluation (legal, medical, etc.)
  • Proprietary scoring logic
  • Integration with external systems

Creating Custom Probes

See Custom Probe Tutorial for a complete walkthrough.

Quick template:

from insideLLMs.probes.base import Probe
from insideLLMs.types import ProbeCategory

class MyProbe(Probe[dict]):
    name = "my_probe"
    default_category = ProbeCategory.CUSTOM

    def run(self, model, data, **kwargs) -> dict:
        prompt = data.get("prompt", str(data))
        response = model.generate(prompt)
        return {"response": response, "custom_field": "value"}

See Also