llm-toll¶
Lightweight, drop-in Python decorator to track costs, monitor token usage, and enforce budget and rate limits for LLM API calls.
Why llm-toll?¶
LLM API costs add up fast during prototyping and development. llm-toll gives you visibility and control with a single decorator -- no config files, no initialization steps, no external services.
from llm_toll import track_costs
@track_costs(project="my_app", max_budget=5.00)
def summarize(text):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": text}]
)
return response
That's it. Every call is logged, costs are calculated, and budget is enforced automatically.
Key Features¶
- Drop-In Decorator -- One line of code. No config files, no initialization.
- Multi-Provider -- Built-in support for OpenAI, Anthropic, Gemini, and OpenAI-compatible endpoints.
- Hard Budget Caps -- Prevents execution when cumulative cost exceeds your threshold.
- Rate Limiting -- Local RPM and TPM enforcement to prevent HTTP 429 errors.
- Streaming Support -- Transparent cost tracking for sync and async streaming responses.
- Local Persistence -- SQLite-backed usage tracking across sessions. Optional PostgreSQL for teams.
- Cost Reporting -- Color-coded terminal summaries per call and per session.
- CLI & Web Dashboard -- View stats, export CSV, launch a browser-based analytics dashboard.
Supported Providers¶
| Provider | SDK Auto-Parsing | Streaming | Custom Overrides |
|---|---|---|---|
| OpenAI | Yes | Yes | Yes |
| Anthropic | Yes | Yes | Yes |
| Google Gemini | Yes | Yes | Yes |
| Local/Ollama | Via OpenAI-compat | N/A | Rate limiting only |
Quick Install¶
See the Installation page for extras and provider-specific packages.
Quick Start¶
from llm_toll import track_costs
# Bare decorator -- auto-detects model and tokens from SDK response
@track_costs
def chat(prompt):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
# With budget and rate limits
@track_costs(project="my_app", max_budget=2.00, rate_limit=50)
def chat_limited(prompt):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
See the Quick Start guide for more examples.