Models Catalog

Complete reference for all supported model providers.

Overview

Provider	Model Type	API Key Required
OpenAI	`openai`	Yes
Anthropic	`anthropic`	Yes
Google/Gemini	`gemini`	Yes
Cohere	`cohere`	Yes
HuggingFace	`huggingface`	Optional
Ollama	`ollama`	No
vLLM	`vllm`	No
llama.cpp	`llamacpp`	No
DummyModel	`dummy`	No

OpenAI

OpenAI models (GPT-4, GPT-3.5, etc.)

Environment Variables

export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..."  # Optional

Config

models:
  - type: openai
    args:
      model_name: gpt-4o
      temperature: 0.7
      max_tokens: 1000

Available Models

Model	Description
`gpt-4o`	Latest GPT-4 Omni
`gpt-4o-mini`	Smaller, faster GPT-4
`gpt-4-turbo`	GPT-4 Turbo
`gpt-3.5-turbo`	Fast and affordable

Python

from insideLLMs.models import OpenAIModel

model = OpenAIModel(
    model_name="gpt-4o",
    temperature=0.7,
    max_tokens=1000
)

response = model.generate("Hello, world!")

Common Options

Option	Type	Default	Description
`model_name`	str	`"gpt-4o-mini"`	Model identifier
`temperature`	float	`1.0`	Sampling temperature
`max_tokens`	int	`None`	Max response tokens
`top_p`	float	`1.0`	Nucleus sampling
`timeout`	int	`60`	Request timeout

Anthropic

Anthropic Claude models.

Environment Variables

export ANTHROPIC_API_KEY="sk-ant-..."

Config

models:
  - type: anthropic
    args:
      model_name: claude-3-5-sonnet-20241022
      max_tokens: 1000

Available Models

Model	Description
`claude-3-5-sonnet-20241022`	Latest Sonnet
`claude-3-opus-20240229`	Most capable
`claude-3-haiku-20240307`	Fastest

Python

from insideLLMs.models import AnthropicModel

model = AnthropicModel(
    model_name="claude-3-5-sonnet-20241022",
    max_tokens=1000
)

Google Gemini

Google’s Gemini models.

Environment Variables

export GOOGLE_API_KEY="..."

Config

models:
  - type: gemini
    args:
      model_name: gemini-pro

Available Models

Model	Description
`gemini-pro`	General purpose
`gemini-pro-vision`	Multimodal

Cohere

Cohere Command models.

Environment Variables

export CO_API_KEY="..."
# or
export COHERE_API_KEY="..."

Config

models:
  - type: cohere
    args:
      model_name: command

HuggingFace

HuggingFace Transformers models (local or API).

Environment Variables

export HUGGINGFACEHUB_API_TOKEN="hf_..."  # Optional for private models

Config (Local)

models:
  - type: huggingface
    args:
      model_name: meta-llama/Llama-2-7b-chat-hf
      device: cuda  # or cpu, mps

Config (API)

models:
  - type: huggingface
    args:
      model_name: meta-llama/Llama-2-7b-chat-hf
      use_api: true

Python

from insideLLMs.models import HuggingFaceModel

model = HuggingFaceModel(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    device="cuda"
)

Common Options

Option	Type	Default	Description
`model_name`	str	Required	HF model identifier
`device`	str	`"auto"`	cuda, cpu, mps
`use_api`	bool	`False`	Use HF Inference API
`torch_dtype`	str	`"auto"`	float16, bfloat16, float32

Ollama

Local models via Ollama.

Setup

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3

Config

models:
  - type: ollama
    args:
      model_name: llama3
      base_url: http://localhost:11434

Available Models

Any model available via ollama pull:

Model	Command
Llama 3	`ollama pull llama3`
Mistral	`ollama pull mistral`
CodeLlama	`ollama pull codellama`
Gemma	`ollama pull gemma`

Python

from insideLLMs.models import OllamaModel

model = OllamaModel(
    model_name="llama3",
    base_url="http://localhost:11434"
)

vLLM

High-performance local inference with vLLM.

Setup

pip install vllm

Config

models:
  - type: vllm
    args:
      model_name: meta-llama/Llama-2-7b-chat-hf
      tensor_parallel_size: 1

llama.cpp

CPU-optimised local inference.

Setup

pip install llama-cpp-python

Config

models:
  - type: llamacpp
    args:
      model_path: /path/to/model.gguf
      n_ctx: 2048

DummyModel

Testing model that returns fixed responses.

Config

models:
  - type: dummy
    args:
      name: test_model
      response: "This is a test response."

Python

from insideLLMs.models import DummyModel

model = DummyModel(name="test", response="Fixed response")

Use Cases

Testing: Verify framework behaviour without API costs
CI/CD: Deterministic baseline runs
Development: Build and debug probes

Using the Registry

Get models by name:

from insideLLMs.registry import model_registry, ensure_builtins_registered

ensure_builtins_registered()

# Get a model
model = model_registry.get("openai", model_name="gpt-4o")

# List available models
print(model_registry.list())
# ['openai', 'anthropic', 'gemini', 'cohere', 'huggingface', 'ollama', 'dummy', ...]

Common Interface

All models implement the same interface:

# Text generation
response = model.generate("prompt", temperature=0.7)

# Chat/multi-turn
response = model.chat([
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
])

# Streaming
for chunk in model.stream("prompt"):
    print(chunk, end="")

# Model info
info = model.info()
# {"name": "gpt-4o", "provider": "openai", "model_id": "gpt-4o", ...}

Creating Custom Models

See API Reference for the full Model interface.

Basic structure:

from insideLLMs.models.base import Model

class MyModel(Model):
    def __init__(self, name: str = "my_model", **kwargs):
        super().__init__(name=name, **kwargs)
    
    def generate(self, prompt: str, **kwargs) -> str:
        # Your implementation
        return "response"
    
    def info(self) -> dict:
        return {"name": self.name, "provider": "custom"}