track_costs¶
The main entry point for llm-toll. A decorator that wraps any function making LLM API calls to track costs, enforce budgets, and rate-limit.
Signature¶
@track_costs
def my_func(): ...
@track_costs(
project: str = "default",
model: str | None = None,
max_budget: float | None = None,
reset: str | None = None,
rate_limit: int | None = None,
tpm_limit: int | None = None,
extract_usage: Callable[..., tuple[str, int, int]] | None = None,
)
def my_func(): ...
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
project |
str |
"default" |
Project name for grouping usage in the store |
model |
str \| None |
None |
Override the model name (bypasses auto-detection) |
max_budget |
float \| None |
None |
Hard budget cap in USD |
reset |
str \| None |
None |
Budget reset period (e.g., "monthly") |
rate_limit |
int \| None |
None |
Maximum requests per minute (RPM) |
tpm_limit |
int \| None |
None |
Maximum tokens per minute (TPM) |
extract_usage |
Callable \| None |
None |
Custom usage extractor: receives the return value, must return (model, input_tokens, output_tokens) |
Usage Modes¶
Bare Decorator¶
@track_costs
def chat(prompt):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
With Arguments¶
Async Functions¶
@track_costs(project="my_app")
async def async_chat(prompt):
response = await client.chat.completions.create(...)
return response
Async Generators¶
@track_costs(project="my_app")
async def async_stream(prompt):
stream = await client.chat.completions.create(..., stream=True)
async for chunk in stream:
yield chunk
Behavior¶
- Sync functions -- Wrapped with a sync wrapper. Budget and rate limit checks happen pre-call. Usage extraction and logging happen post-call.
- Async coroutines -- Detected via
inspect.iscoroutinefunction(). SQLite operations useasyncio.to_thread(). - Async generators -- Detected via
inspect.isasyncgenfunction(). Wrapped to yield chunks transparently with deferred cost tracking. - Sync generators/streams -- Detected post-call by checking for
__next__+close(). Wrapped viawrap_sync_stream(). - Async streams (return value) -- If an async function returns an async iterable, it is wrapped via
wrap_async_stream().
Return Value¶
The decorator is transparent -- it returns whatever the wrapped function returns. For streaming responses, it returns a wrapped generator/async generator that yields the same chunks.
Helper Functions¶
set_store(store)¶
from llm_toll import set_store, SQLiteStore
set_store(SQLiteStore(db_path="/custom/path.db"))
set_store(None) # Reset to default
Inject a custom store for the decorator to use. All subsequent @track_costs calls will use this store.
set_reporter(reporter)¶
from llm_toll import set_reporter, CostReporter
set_reporter(CostReporter(enabled=False)) # Suppress output
set_reporter(None) # Reset to default
Inject a custom CostReporter for the decorator to use.
Exceptions¶
| Exception | When |
|---|---|
BudgetExceededError |
Pre-call or post-call budget check fails |
LocalRateLimitError |
Pre-call rate limit check fails |
PricingMatrixOutdatedWarning |
Model not found in pricing registry (warning, not exception) |