Quick Start¶
Basic Usage (Auto-detect)¶
For standard SDK clients, the decorator automatically extracts the model name and token counts from the response object:
from llm_toll import track_costs
@track_costs(project="my_scraper", max_budget=2.00, reset="monthly")
def generate_summary(text):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": text}]
)
return response # Decorator parses the usage from this object
The decorator intercepts the return value, extracts token usage via duck-typing, calculates the cost from the built-in pricing registry, and logs everything to the local SQLite database.
Bare Decorator¶
If you just want basic tracking with defaults, use the decorator without arguments:
@track_costs
def chat(prompt):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
This uses project="default" with no budget cap or rate limits.
Advanced Usage (Rate Limits & Explicit Models)¶
For custom setups or raw API requests, explicitly specify the model and limits:
from llm_toll import track_costs
@track_costs(
model="claude-sonnet-4-20250514",
rate_limit=50, # max 50 requests per minute
tpm_limit=40000, # max 40k tokens per minute
extract_usage=lambda res: (res['model'], res['in_tokens'], res['out_tokens'])
)
def custom_anthropic_call(prompt):
# custom logic here
pass
The extract_usage callable receives the function's return value and must return a tuple of (model_name, input_tokens, output_tokens).
Error Handling¶
from llm_toll.exceptions import BudgetExceededError, LocalRateLimitError
try:
result = generate_summary("some text")
except BudgetExceededError as e:
print(f"Budget exceeded: ${e.current_cost:.4f} >= ${e.max_budget:.4f}")
except LocalRateLimitError as e:
print(f"Rate limit hit, retry after {e.retry_after:.1f}s")
Terminal Output¶
Each call prints a color-coded cost summary to stderr:
Session totals are available via the CostReporter:
from llm_toll import CostReporter
reporter = CostReporter()
# ... run your calls ...
reporter.report_session()
# [session] 5 calls, 750 in / 415 out, total $0.0058
What's Next?¶
- Cost Tracking -- How the tracking pipeline works
- Budget Enforcement -- Setting budget caps
- Rate Limiting -- RPM and TPM limits
- Streaming -- Streaming response support
- Providers -- Provider-specific guides