Retry Strategies

Handle transient failures gracefully.

The Problem

API calls fail. Rate limits hit. Networks timeout. Your code crashes.

The Solution

Automatic retry with exponential backoff and circuit breakers.

from insideLLMs.pipeline import ModelPipeline, RetryMiddleware

pipeline = ModelPipeline(model)
pipeline.add_middleware(RetryMiddleware(
    max_attempts=3,
    initial_delay=1.0,
    exponential_base=2.0,
    max_delay=60.0
))

# Automatically retries on failure
response = pipeline.generate(prompt)

Retry Configuration

from insideLLMs.retry import RetryConfig

config = RetryConfig(
    max_attempts=3,           # Try up to 3 times
    initial_delay=1.0,        # Start with 1 second delay
    exponential_base=2.0,     # Double delay each retry
    max_delay=60.0,           # Cap at 60 seconds
    jitter=True,              # Add randomness to prevent thundering herd
    retry_on=[TimeoutError, RateLimitError]  # Which errors to retry
)

Backoff Strategies

Exponential Backoff

# Delays: 1s, 2s, 4s, 8s, ...
RetryMiddleware(
    initial_delay=1.0,
    exponential_base=2.0
)

Linear Backoff

# Delays: 1s, 2s, 3s, 4s, ...
RetryMiddleware(
    initial_delay=1.0,
    exponential_base=1.0
)

Constant Delay

# Delays: 2s, 2s, 2s, ...
RetryMiddleware(
    initial_delay=2.0,
    exponential_base=1.0,
    max_delay=2.0
)

Circuit Breaker

Prevent cascade failures by stopping requests after repeated failures.

from insideLLMs.retry import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,      # Open after 5 failures
    recovery_timeout=60.0,    # Try again after 60s
    half_open_max_calls=3     # Test with 3 calls before fully closing
)

# Use with pipeline
pipeline.add_middleware(RetryMiddleware(circuit_breaker=breaker))

# Circuit opens after failures
try:
    response = pipeline.generate(prompt)
except CircuitBreakerOpen:
    print("Circuit open - too many failures")

Selective Retry

from insideLLMs.exceptions import RateLimitError, TimeoutError, ValidationError

# Retry only specific errors
RetryMiddleware(
    max_attempts=3,
    retry_on=[RateLimitError, TimeoutError],  # Retry these
    no_retry_on=[ValidationError]             # Don't retry these
)

Retry with Backpressure

# Reduce concurrency on retry
RetryMiddleware(
    max_attempts=3,
    reduce_concurrency_on_retry=True,
    min_concurrency=1
)

Monitoring Retries

from insideLLMs.retry import RetryStats

stats = pipeline.get_retry_stats()
print(f"Total retries: {stats.total_retries}")
print(f"Success after retry: {stats.retry_successes}")
print(f"Failed after retries: {stats.retry_failures}")
print(f"Average attempts: {stats.avg_attempts:.1f}")

Configuration

# In harness config
retry:
  enabled: true
  max_attempts: 3
  initial_delay: 1.0
  exponential_base: 2.0
  max_delay: 60.0
  jitter: true
  circuit_breaker:
    enabled: true
    failure_threshold: 5
    recovery_timeout: 60.0

Best Practices

Do:

  • Use exponential backoff for rate limits
  • Add jitter to prevent thundering herd
  • Set max_delay to prevent infinite waits
  • Use circuit breakers for cascading failures

Don’t:

  • Retry validation errors (they won’t succeed)
  • Set max_attempts too high (wastes time/money)
  • Retry without backoff (hammers the API)

Why This Matters

Without retry:

  • Transient failures crash your pipeline
  • Manual retry logic scattered everywhere
  • No protection against cascade failures
  • Wasted API calls on permanent errors

With retry:

  • Transient failures handled automatically
  • Centralised retry logic
  • Circuit breakers prevent cascades
  • Smart retry only on retriable errors

See Also