Models Catalog
Complete reference for all supported model providers.
Overview
| Provider | Model Type | API Key Required |
|---|---|---|
| OpenAI | openai | Yes |
| Anthropic | anthropic | Yes |
| Google/Gemini | gemini | Yes |
| Cohere | cohere | Yes |
| OpenRouter | openrouter | Yes |
| HuggingFace | huggingface | Optional |
| Ollama | ollama | No |
| vLLM | vllm | No |
| llama.cpp | llamacpp | No |
| DummyModel | dummy | No |
OpenAI
OpenAI models (GPT-4, GPT-3.5, etc.)
Environment Variables
export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..." # Optional
Config
models:
- type: openai
args:
model_name: gpt-4o
temperature: 0.7
max_tokens: 1000
Available Models
| Model | Description |
|---|---|
gpt-4o | Latest GPT-4 Omni |
gpt-4o-mini | Smaller, faster GPT-4 |
gpt-4-turbo | GPT-4 Turbo |
gpt-3.5-turbo | Fast and affordable |
Python
from insideLLMs.models import OpenAIModel
model = OpenAIModel(
model_name="gpt-4o",
temperature=0.7,
max_tokens=1000
)
response = model.generate("Hello, world!")
Common Options
| Option | Type | Default | Description |
|---|---|---|---|
model_name | str | "gpt-4o-mini" | Model identifier |
temperature | float | 1.0 | Sampling temperature |
max_tokens | int | None | Max response tokens |
top_p | float | 1.0 | Nucleus sampling |
timeout | int | 60 | Request timeout |
Anthropic
Anthropic Claude models.
Environment Variables
export ANTHROPIC_API_KEY="sk-ant-..."
Config
models:
- type: anthropic
args:
model_name: claude-3-5-sonnet-20241022
max_tokens: 1000
Available Models
| Model | Description |
|---|---|
claude-3-5-sonnet-20241022 | Latest Sonnet |
claude-3-opus-20240229 | Most capable |
claude-3-haiku-20240307 | Fastest |
Python
from insideLLMs.models import AnthropicModel
model = AnthropicModel(
model_name="claude-3-5-sonnet-20241022",
max_tokens=1000
)
Google Gemini
Google’s Gemini models.
Environment Variables
export GOOGLE_API_KEY="..."
Config
models:
- type: gemini
args:
model_name: gemini-pro
Available Models
| Model | Description |
|---|---|
gemini-pro | General purpose |
gemini-pro-vision | Multimodal |
Cohere
Cohere Command models.
Environment Variables
export CO_API_KEY="..."
# or
export COHERE_API_KEY="..."
Config
models:
- type: cohere
args:
model_name: command
OpenRouter
Access 200+ models through OpenRouter’s unified API.
OpenRouter provides a unified API gateway to models from OpenAI, Anthropic, Google, Meta, Mistral, and many others. It uses an OpenAI-compatible API surface, making it easy to switch between providers.
Environment Variables
export OPENROUTER_API_KEY="sk-or-..."
# Optional: Attribution headers
export OPENROUTER_HTTP_REFERER="https://your-app.com"
export OPENROUTER_APP_TITLE="My App"
Config
models:
- type: openrouter
args:
model_name: openai/gpt-4o
Available Models
OpenRouter supports 200+ models. Common examples:
| Model | Provider | Model Name |
|---|---|---|
| GPT-4o | OpenAI | openai/gpt-4o |
| GPT-4o Mini | OpenAI | openai/gpt-4o-mini |
| Claude 3.5 Sonnet | Anthropic | anthropic/claude-3.5-sonnet |
| Claude 3 Opus | Anthropic | anthropic/claude-3-opus |
| Gemini Pro | google/gemini-pro | |
| Llama 3.1 70B | Meta | meta-llama/llama-3.1-70b-instruct |
| Mixtral 8x7B | Mistral | mistralai/mixtral-8x7b-instruct |
| DeepSeek V3 | DeepSeek | deepseek/deepseek-chat |
See OpenRouter Models for the full list.
Python
from insideLLMs.models import OpenRouterModel
model = OpenRouterModel(
model_name="anthropic/claude-3.5-sonnet",
http_referer="https://my-app.com", # Optional
app_title="My App" # Optional
)
response = model.generate("Hello, world!")
Common Options
| Option | Type | Default | Description |
|---|---|---|---|
model_name | str | "openai/gpt-4o-mini" | Model identifier (provider/model format) |
api_key | str | None | API key (or use OPENROUTER_API_KEY env) |
base_url | str | "https://openrouter.ai/api/v1" | API endpoint |
http_referer | str | None | Referer header for attribution |
app_title | str | None | App title header for attribution |
temperature | float | 1.0 | Sampling temperature |
max_tokens | int | None | Maximum response tokens |
Provider Routing
OpenRouter can route requests to specific providers:
# Force using Anthropic's API directly (not a proxy)
model = OpenRouterModel(
model_name="anthropic/claude-3.5-sonnet:beta"
)
# Use the cheapest available provider
model = OpenRouterModel(
model_name="openai/gpt-4o",
extra_headers={"X-OpenRouter-Provider": "any"}
)
HuggingFace
HuggingFace Transformers models (local or API).
Environment Variables
export HUGGINGFACEHUB_API_TOKEN="hf_..." # Optional for private models
Config (Local)
models:
- type: huggingface
args:
model_name: meta-llama/Llama-2-7b-chat-hf
device: cuda # or cpu, mps
Config (API)
models:
- type: huggingface
args:
model_name: meta-llama/Llama-2-7b-chat-hf
use_api: true
Python
from insideLLMs.models import HuggingFaceModel
model = HuggingFaceModel(
model_name="meta-llama/Llama-2-7b-chat-hf",
device="cuda"
)
Common Options
| Option | Type | Default | Description |
|---|---|---|---|
model_name | str | Required | HF model identifier |
device | str | "auto" | cuda, cpu, mps |
use_api | bool | False | Use HF Inference API |
torch_dtype | str | "auto" | float16, bfloat16, float32 |
Ollama
Local models via Ollama.
Setup
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3
Config
models:
- type: ollama
args:
model_name: llama3
base_url: http://localhost:11434
Available Models
Any model available via ollama pull:
| Model | Command |
|---|---|
| Llama 3 | ollama pull llama3 |
| Mistral | ollama pull mistral |
| CodeLlama | ollama pull codellama |
| Gemma | ollama pull gemma |
Python
from insideLLMs.models import OllamaModel
model = OllamaModel(
model_name="llama3",
base_url="http://localhost:11434"
)
vLLM
High-performance local inference with vLLM.
Setup
pip install vllm
Config
models:
- type: vllm
args:
model_name: meta-llama/Llama-2-7b-chat-hf
tensor_parallel_size: 1
llama.cpp
CPU-optimised local inference.
Setup
pip install llama-cpp-python
Config
models:
- type: llamacpp
args:
model_path: /path/to/model.gguf
n_ctx: 2048
DummyModel
Testing model that returns fixed responses.
Config
models:
- type: dummy
args:
name: test_model
canned_response: "This is a test response."
Python
from insideLLMs.models import DummyModel
model = DummyModel(name="test", canned_response="Fixed response")
Use Cases
- Testing: Verify framework behaviour without API costs
- CI/CD: Deterministic baseline runs
- Development: Build and debug probes
Using the Registry
Get models by name:
from insideLLMs.registry import model_registry, ensure_builtins_registered
ensure_builtins_registered()
# Get a model
model = model_registry.get("openai", model_name="gpt-4o")
# List available models
print(model_registry.list())
# ['openai', 'anthropic', 'gemini', 'cohere', 'huggingface', 'ollama', 'dummy', ...]
Common Interface
All models implement the same interface:
# Text generation
response = model.generate("prompt", temperature=0.7)
# Chat/multi-turn
response = model.chat([
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
])
# Streaming
for chunk in model.stream("prompt"):
print(chunk, end="")
# Model info
info = model.info()
# ModelInfo(name="gpt-4o", provider="OpenAI", model_id="gpt-4o", ...)
Creating Custom Models
See the API Reference for the full Model interface.
Basic structure:
from insideLLMs.models.base import Model
class MyModel(Model):
def __init__(self, name: str = "my_model", **kwargs):
super().__init__(name=name, **kwargs)
def generate(self, prompt: str, **kwargs) -> str:
# Your implementation
return "response"
# info() is inherited from Model and returns ModelInfo automatically