LiteLLM Integration¶

Overview¶

The LiteLLMCallback provides zero-decorator cost tracking for all LiteLLM completions. Register it once and every LiteLLM call is automatically tracked.

Setup¶

import litellm
from llm_toll import LiteLLMCallback

litellm.callbacks = [LiteLLMCallback(project="my-app", max_budget=10.0)]

# All litellm completions are now tracked automatically
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

Parameters¶

Parameter	Type	Default	Description
`project`	`str`	`"default"`	Project name for grouping usage
`max_budget`	`float \\| None`	`None`	Hard budget cap in USD
`store`	`BaseStore \\| None`	`None`	Custom store instance (defaults to shared store)
`reporter`	`CostReporter \\| None`	`None`	Custom reporter instance

Model Normalization¶

LiteLLM uses model strings with provider prefixes like "openai/gpt-4o" or "anthropic/claude-sonnet-4-20250514". The callback automatically strips the provider prefix when the suffix is a known model in the pricing registry.

This preserves namespace-prefixed models like "ollama/llama3" that rely on the "ollama/" pricing prefix.

Examples:

LiteLLM Model	Resolved Model
`openai/gpt-4o`	`gpt-4o`
`anthropic/claude-sonnet-4-20250514`	`claude-sonnet-4-20250514`
`ollama/llama3`	`ollama/llama3` (preserved)

Callback Methods¶

`log_success_event`¶

Called by LiteLLM after a successful completion. Extracts token usage from the response object using the same auto-detection pipeline as @track_costs, calculates cost, and logs it to the store.

`log_failure_event`¶

Called by LiteLLM after a failed completion. No-op -- failed calls are not tracked.

Budget Enforcement¶

When max_budget is set, the callback checks the budget on the next successful completion. If the accumulated cost exceeds the budget, BudgetExceededError is raised.

Note

Budget is checked at log time (after the call), not before. For pre-call budget enforcement, combine with the @track_costs decorator.

Combining with @track_costs¶

The callback and decorator can be used together:

import litellm
from llm_toll import LiteLLMCallback, track_costs

litellm.callbacks = [LiteLLMCallback(project="my-app")]

@track_costs(project="my-app", max_budget=10.0)
def important_call(prompt):
    return litellm.completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )

Warning

If you use both, the call may be tracked twice. Use one or the other unless you have a specific reason to combine them.