track_costs¶

The main entry point for llm-toll. A decorator that wraps any function making LLM API calls to track costs, enforce budgets, and rate-limit.

Signature¶

@track_costs
def my_func(): ...

@track_costs(
    project: str = "default",
    model: str | None = None,
    max_budget: float | None = None,
    reset: str | None = None,
    rate_limit: int | None = None,
    tpm_limit: int | None = None,
    extract_usage: Callable[..., tuple[str, int, int]] | None = None,
)
def my_func(): ...

Parameters¶

Parameter	Type	Default	Description
`project`	`str`	`"default"`	Project name for grouping usage in the store
`model`	`str \\| None`	`None`	Override the model name (bypasses auto-detection)
`max_budget`	`float \\| None`	`None`	Hard budget cap in USD
`reset`	`str \\| None`	`None`	Budget reset period (e.g., `"monthly"`)
`rate_limit`	`int \\| None`	`None`	Maximum requests per minute (RPM)
`tpm_limit`	`int \\| None`	`None`	Maximum tokens per minute (TPM)
`extract_usage`	`Callable \\| None`	`None`	Custom usage extractor: receives the return value, must return `(model, input_tokens, output_tokens)`

Usage Modes¶

Bare Decorator¶

@track_costs
def chat(prompt):
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )

With Arguments¶

@track_costs(project="my_app", max_budget=5.00, rate_limit=50)
def chat(prompt):
    ...

Async Functions¶

@track_costs(project="my_app")
async def async_chat(prompt):
    response = await client.chat.completions.create(...)
    return response

Async Generators¶

@track_costs(project="my_app")
async def async_stream(prompt):
    stream = await client.chat.completions.create(..., stream=True)
    async for chunk in stream:
        yield chunk

Behavior¶

Sync functions -- Wrapped with a sync wrapper. Budget and rate limit checks happen pre-call. Usage extraction and logging happen post-call.
Async coroutines -- Detected via inspect.iscoroutinefunction(). SQLite operations use asyncio.to_thread().
Async generators -- Detected via inspect.isasyncgenfunction(). Wrapped to yield chunks transparently with deferred cost tracking.
Sync generators/streams -- Detected post-call by checking for __next__ + close(). Wrapped via wrap_sync_stream().
Async streams (return value) -- If an async function returns an async iterable, it is wrapped via wrap_async_stream().

Return Value¶

The decorator is transparent -- it returns whatever the wrapped function returns. For streaming responses, it returns a wrapped generator/async generator that yields the same chunks.

Helper Functions¶

`set_store(store)`¶

from llm_toll import set_store, SQLiteStore

set_store(SQLiteStore(db_path="/custom/path.db"))
set_store(None)  # Reset to default

Inject a custom store for the decorator to use. All subsequent @track_costs calls will use this store.

`set_reporter(reporter)`¶

from llm_toll import set_reporter, CostReporter

set_reporter(CostReporter(enabled=False))  # Suppress output
set_reporter(None)  # Reset to default

Inject a custom CostReporter for the decorator to use.

Exceptions¶

Exception	When
`BudgetExceededError`	Pre-call or post-call budget check fails
`LocalRateLimitError`	Pre-call rate limit check fails
`PricingMatrixOutdatedWarning`	Model not found in pricing registry (warning, not exception)