Models Catalog
Complete reference for all supported model providers.
Overview
| Provider | Model Type | API Key Required |
|---|---|---|
| OpenAI | openai | Yes |
| Anthropic | anthropic | Yes |
| Google/Gemini | gemini | Yes |
| Cohere | cohere | Yes |
| HuggingFace | huggingface | Optional |
| Ollama | ollama | No |
| vLLM | vllm | No |
| llama.cpp | llamacpp | No |
| DummyModel | dummy | No |
OpenAI
OpenAI models (GPT-4, GPT-3.5, etc.)
Environment Variables
export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..." # Optional
Config
models:
- type: openai
args:
model_name: gpt-4o
temperature: 0.7
max_tokens: 1000
Available Models
| Model | Description |
|---|---|
gpt-4o | Latest GPT-4 Omni |
gpt-4o-mini | Smaller, faster GPT-4 |
gpt-4-turbo | GPT-4 Turbo |
gpt-3.5-turbo | Fast and affordable |
Python
from insideLLMs.models import OpenAIModel
model = OpenAIModel(
model_name="gpt-4o",
temperature=0.7,
max_tokens=1000
)
response = model.generate("Hello, world!")
Common Options
| Option | Type | Default | Description |
|---|---|---|---|
model_name | str | "gpt-4o-mini" | Model identifier |
temperature | float | 1.0 | Sampling temperature |
max_tokens | int | None | Max response tokens |
top_p | float | 1.0 | Nucleus sampling |
timeout | int | 60 | Request timeout |
Anthropic
Anthropic Claude models.
Environment Variables
export ANTHROPIC_API_KEY="sk-ant-..."
Config
models:
- type: anthropic
args:
model_name: claude-3-5-sonnet-20241022
max_tokens: 1000
Available Models
| Model | Description |
|---|---|
claude-3-5-sonnet-20241022 | Latest Sonnet |
claude-3-opus-20240229 | Most capable |
claude-3-haiku-20240307 | Fastest |
Python
from insideLLMs.models import AnthropicModel
model = AnthropicModel(
model_name="claude-3-5-sonnet-20241022",
max_tokens=1000
)
Google Gemini
Google’s Gemini models.
Environment Variables
export GOOGLE_API_KEY="..."
Config
models:
- type: gemini
args:
model_name: gemini-pro
Available Models
| Model | Description |
|---|---|
gemini-pro | General purpose |
gemini-pro-vision | Multimodal |
Cohere
Cohere Command models.
Environment Variables
export CO_API_KEY="..."
# or
export COHERE_API_KEY="..."
Config
models:
- type: cohere
args:
model_name: command
HuggingFace
HuggingFace Transformers models (local or API).
Environment Variables
export HUGGINGFACEHUB_API_TOKEN="hf_..." # Optional for private models
Config (Local)
models:
- type: huggingface
args:
model_name: meta-llama/Llama-2-7b-chat-hf
device: cuda # or cpu, mps
Config (API)
models:
- type: huggingface
args:
model_name: meta-llama/Llama-2-7b-chat-hf
use_api: true
Python
from insideLLMs.models import HuggingFaceModel
model = HuggingFaceModel(
model_name="meta-llama/Llama-2-7b-chat-hf",
device="cuda"
)
Common Options
| Option | Type | Default | Description |
|---|---|---|---|
model_name | str | Required | HF model identifier |
device | str | "auto" | cuda, cpu, mps |
use_api | bool | False | Use HF Inference API |
torch_dtype | str | "auto" | float16, bfloat16, float32 |
Ollama
Local models via Ollama.
Setup
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3
Config
models:
- type: ollama
args:
model_name: llama3
base_url: http://localhost:11434
Available Models
Any model available via ollama pull:
| Model | Command |
|---|---|
| Llama 3 | ollama pull llama3 |
| Mistral | ollama pull mistral |
| CodeLlama | ollama pull codellama |
| Gemma | ollama pull gemma |
Python
from insideLLMs.models import OllamaModel
model = OllamaModel(
model_name="llama3",
base_url="http://localhost:11434"
)
vLLM
High-performance local inference with vLLM.
Setup
pip install vllm
Config
models:
- type: vllm
args:
model_name: meta-llama/Llama-2-7b-chat-hf
tensor_parallel_size: 1
llama.cpp
CPU-optimised local inference.
Setup
pip install llama-cpp-python
Config
models:
- type: llamacpp
args:
model_path: /path/to/model.gguf
n_ctx: 2048
DummyModel
Testing model that returns fixed responses.
Config
models:
- type: dummy
args:
name: test_model
response: "This is a test response."
Python
from insideLLMs.models import DummyModel
model = DummyModel(name="test", response="Fixed response")
Use Cases
- Testing: Verify framework behaviour without API costs
- CI/CD: Deterministic baseline runs
- Development: Build and debug probes
Using the Registry
Get models by name:
from insideLLMs.registry import model_registry, ensure_builtins_registered
ensure_builtins_registered()
# Get a model
model = model_registry.get("openai", model_name="gpt-4o")
# List available models
print(model_registry.list())
# ['openai', 'anthropic', 'gemini', 'cohere', 'huggingface', 'ollama', 'dummy', ...]
Common Interface
All models implement the same interface:
# Text generation
response = model.generate("prompt", temperature=0.7)
# Chat/multi-turn
response = model.chat([
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
])
# Streaming
for chunk in model.stream("prompt"):
print(chunk, end="")
# Model info
info = model.info()
# {"name": "gpt-4o", "provider": "openai", "model_id": "gpt-4o", ...}
Creating Custom Models
See API Reference for the full Model interface.
Basic structure:
from insideLLMs.models.base import Model
class MyModel(Model):
def __init__(self, name: str = "my_model", **kwargs):
super().__init__(name=name, **kwargs)
def generate(self, prompt: str, **kwargs) -> str:
# Your implementation
return "response"
def info(self) -> dict:
return {"name": self.name, "provider": "custom"}