Frequently Asked Questions
Quick answers to common questions. For detailed troubleshooting, see Troubleshooting.
Installation
Do I need API keys to get started?
No. Use DummyModel for offline testing:
insidellms quicktest "Hello" --model dummy
API keys are only needed for hosted providers (OpenAI, Anthropic, etc.).
What Python version do I need?
Python 3.10 or higher. Check with:
python --version
How do I install optional features?
pip install -e ".[nlp]" # NLP features
pip install -e ".[visualisation]" # Charts and reports
pip install -e ".[all]" # Everything
Configuration
Why can’t it find my dataset file?
Relative paths are resolved from the config file’s directory, not your current directory.
# If config is at /project/configs/harness.yaml
dataset:
path: ../data/test.jsonl # Resolves to /project/data/test.jsonl
How do I use environment variables in configs?
model:
type: openai
args:
api_key: ${OPENAI_API_KEY}
What’s the difference between run and harness?
| Command | Models | Probes | Use Case |
|---|---|---|---|
run | Single | Single | Simple tests |
harness | Multiple | Multiple | Comparisons |
Running
Why does --overwrite refuse to overwrite?
Safety guard. insideLLMs only overwrites directories containing .insidellms_run marker.
Solutions:
- Use
--overwritewith a valid run directory - Delete the directory manually
- Use a new directory name
How do I resume an interrupted run?
insidellms run config.yaml --run-dir ./my_run --resume
This continues from where it left off using existing records.jsonl.
Can I run multiple models in parallel?
Yes, with async execution:
async: true
concurrency: 10
Or via CLI: --async --concurrency 10
Models
Can I run local models?
Yes! Supported options:
| Runner | Setup |
|---|---|
| Ollama | ollama pull llama3 |
| llama.cpp | Download GGUF model |
| vLLM | pip install vllm |
See Local Models Guide.
How do I compare different models?
Use a harness config:
models:
- type: openai
args: {model_name: gpt-4o}
- type: anthropic
args: {model_name: claude-3-5-sonnet-20241022}
What models are supported?
OpenAI, Anthropic, Google/Gemini, Cohere, HuggingFace, Ollama, vLLM, llama.cpp, and custom implementations. See Models Catalog.
Cost & Performance
How do I reduce API costs?
- Limit examples:
max_examples: 50 - Enable caching:
cache: {enabled: true} - Use cheaper models: Start with
gpt-4o-mini - Test with DummyModel: No cost for framework testing
How do I speed up runs?
async: true
concurrency: 20
cache:
enabled: true
I’m hitting rate limits. What do I do?
rate_limit:
enabled: true
requests_per_minute: 60
concurrency: 5 # Lower this
See Rate Limiting Guide.
Outputs
What files does insideLLMs create?
| File | Purpose |
|---|---|
records.jsonl | Raw results |
manifest.json | Run metadata |
summary.json | Aggregated stats |
report.html | Visual report |
How do I keep outputs out of ~/.insidellms?
# Per-run
insidellms run config.yaml --run-dir ./my_output
# Global default
export INSIDELLMS_RUN_ROOT=./runs
How do I generate just a report?
insidellms report ./my_run
CI Integration
How do I detect behavioural changes in CI?
insidellms diff ./baseline ./candidate --fail-on-changes
Exit code 1 = changes detected, 0 = identical.
Why do my CI runs produce different outputs?
Model responses are non-deterministic. For deterministic CI:
models:
- type: dummy
args:
response: "Fixed response"
How do I update the baseline after intentional changes?
insidellms harness config.yaml --run-dir ./baseline --overwrite
git add ./baseline
git commit -m "Update baseline: [describe changes]"
Troubleshooting
“command not found: insidellms”
Activate your virtual environment:
source .venv/bin/activate
Or run as module: python -m insideLLMs.cli
“Invalid API key”
- Check key format (OpenAI:
sk-..., Anthropic:sk-ant-...) - Verify key in provider dashboard
- Ensure env var is set:
echo $OPENAI_API_KEY
How do I turn off coloured output?
export NO_COLOR=1
Where can I find example datasets?
data/directory in the repobenchmarks/for standard benchmarksinsideLLMs.benchmark_datasetsfor built-in datasets- HuggingFace datasets via config
Advanced
Can I create custom probes?
Yes! See Custom Probe Tutorial.
from insideLLMs.probes.base import Probe
class MyProbe(Probe):
def run(self, model, data, **kwargs):
return model.generate(data["prompt"])
Can I create custom models?
Yes! Implement the Model interface:
from insideLLMs.models.base import Model
class MyModel(Model):
def generate(self, prompt: str, **kwargs) -> str:
return "response"
How do I integrate with LangChain?
Getting Help
- Troubleshooting Guide
- GitHub Issues
- Run
insidellms doctorto check your environment