Models Catalog

Complete reference for all supported model providers.

Overview

Provider Model Type API Key Required
OpenAI openai Yes
Anthropic anthropic Yes
Google/Gemini gemini Yes
Cohere cohere Yes
HuggingFace huggingface Optional
Ollama ollama No
vLLM vllm No
llama.cpp llamacpp No
DummyModel dummy No

OpenAI

OpenAI models (GPT-4, GPT-3.5, etc.)

Environment Variables

export OPENAI_API_KEY="sk-..."
export OPENAI_ORG_ID="org-..."  # Optional

Config

models:
  - type: openai
    args:
      model_name: gpt-4o
      temperature: 0.7
      max_tokens: 1000

Available Models

Model Description
gpt-4o Latest GPT-4 Omni
gpt-4o-mini Smaller, faster GPT-4
gpt-4-turbo GPT-4 Turbo
gpt-3.5-turbo Fast and affordable

Python

from insideLLMs.models import OpenAIModel

model = OpenAIModel(
    model_name="gpt-4o",
    temperature=0.7,
    max_tokens=1000
)

response = model.generate("Hello, world!")

Common Options

Option Type Default Description
model_name str "gpt-4o-mini" Model identifier
temperature float 1.0 Sampling temperature
max_tokens int None Max response tokens
top_p float 1.0 Nucleus sampling
timeout int 60 Request timeout

Anthropic

Anthropic Claude models.

Environment Variables

export ANTHROPIC_API_KEY="sk-ant-..."

Config

models:
  - type: anthropic
    args:
      model_name: claude-3-5-sonnet-20241022
      max_tokens: 1000

Available Models

Model Description
claude-3-5-sonnet-20241022 Latest Sonnet
claude-3-opus-20240229 Most capable
claude-3-haiku-20240307 Fastest

Python

from insideLLMs.models import AnthropicModel

model = AnthropicModel(
    model_name="claude-3-5-sonnet-20241022",
    max_tokens=1000
)

Google Gemini

Google’s Gemini models.

Environment Variables

export GOOGLE_API_KEY="..."

Config

models:
  - type: gemini
    args:
      model_name: gemini-pro

Available Models

Model Description
gemini-pro General purpose
gemini-pro-vision Multimodal

Cohere

Cohere Command models.

Environment Variables

export CO_API_KEY="..."
# or
export COHERE_API_KEY="..."

Config

models:
  - type: cohere
    args:
      model_name: command

HuggingFace

HuggingFace Transformers models (local or API).

Environment Variables

export HUGGINGFACEHUB_API_TOKEN="hf_..."  # Optional for private models

Config (Local)

models:
  - type: huggingface
    args:
      model_name: meta-llama/Llama-2-7b-chat-hf
      device: cuda  # or cpu, mps

Config (API)

models:
  - type: huggingface
    args:
      model_name: meta-llama/Llama-2-7b-chat-hf
      use_api: true

Python

from insideLLMs.models import HuggingFaceModel

model = HuggingFaceModel(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    device="cuda"
)

Common Options

Option Type Default Description
model_name str Required HF model identifier
device str "auto" cuda, cpu, mps
use_api bool False Use HF Inference API
torch_dtype str "auto" float16, bfloat16, float32

Ollama

Local models via Ollama.

Setup

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3

Config

models:
  - type: ollama
    args:
      model_name: llama3
      base_url: http://localhost:11434

Available Models

Any model available via ollama pull:

Model Command
Llama 3 ollama pull llama3
Mistral ollama pull mistral
CodeLlama ollama pull codellama
Gemma ollama pull gemma

Python

from insideLLMs.models import OllamaModel

model = OllamaModel(
    model_name="llama3",
    base_url="http://localhost:11434"
)

vLLM

High-performance local inference with vLLM.

Setup

pip install vllm

Config

models:
  - type: vllm
    args:
      model_name: meta-llama/Llama-2-7b-chat-hf
      tensor_parallel_size: 1

llama.cpp

CPU-optimised local inference.

Setup

pip install llama-cpp-python

Config

models:
  - type: llamacpp
    args:
      model_path: /path/to/model.gguf
      n_ctx: 2048

DummyModel

Testing model that returns fixed responses.

Config

models:
  - type: dummy
    args:
      name: test_model
      response: "This is a test response."

Python

from insideLLMs.models import DummyModel

model = DummyModel(name="test", response="Fixed response")

Use Cases

  • Testing: Verify framework behaviour without API costs
  • CI/CD: Deterministic baseline runs
  • Development: Build and debug probes

Using the Registry

Get models by name:

from insideLLMs.registry import model_registry, ensure_builtins_registered

ensure_builtins_registered()

# Get a model
model = model_registry.get("openai", model_name="gpt-4o")

# List available models
print(model_registry.list())
# ['openai', 'anthropic', 'gemini', 'cohere', 'huggingface', 'ollama', 'dummy', ...]

Common Interface

All models implement the same interface:

# Text generation
response = model.generate("prompt", temperature=0.7)

# Chat/multi-turn
response = model.chat([
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"}
])

# Streaming
for chunk in model.stream("prompt"):
    print(chunk, end="")

# Model info
info = model.info()
# {"name": "gpt-4o", "provider": "openai", "model_id": "gpt-4o", ...}

Creating Custom Models

See API Reference for the full Model interface.

Basic structure:

from insideLLMs.models.base import Model

class MyModel(Model):
    def __init__(self, name: str = "my_model", **kwargs):
        super().__init__(name=name, **kwargs)
    
    def generate(self, prompt: str, **kwargs) -> str:
        # Your implementation
        return "response"
    
    def info(self) -> dict:
        return {"name": self.name, "provider": "custom"}