PricingRegistry¶

In-memory store of per-model cost-per-token pricing.

Import¶

from llm_toll import PricingRegistry, default_registry

default_registry is the shared singleton used by @track_costs and all callbacks.

Constructor¶

registry = PricingRegistry()

Pre-loaded with built-in pricing for OpenAI, Anthropic, Gemini, and local/Ollama models.

Methods¶

`register_model(model, input_cost_per_token, output_cost_per_token)`¶

Register or override pricing for a model.

default_registry.register_model(
    "my-custom-model",
    input_cost_per_token=1e-06,
    output_cost_per_token=3e-06,
)

Raises ValueError if costs are negative.

`set_fallback_pricing(input_cost_per_token, output_cost_per_token)`¶

Set fallback pricing for any model not yet resolved.

default_registry.set_fallback_pricing(
    input_cost_per_token=1e-06,
    output_cost_per_token=3e-06,
)

Note

Models already queried (and cached as unknown) will not retroactively use the fallback. Set fallback before first use.

`get_cost(model, input_tokens, output_tokens) -> float`¶

Calculate the cost for a given model and token counts.

Lookup order:

Exact match in registered models
Longest-prefix match (cached for subsequent calls)
Fallback pricing if configured
Emits PricingMatrixOutdatedWarning and returns 0.0

cost = default_registry.get_cost("gpt-4o", input_tokens=100, output_tokens=50)

`load_remote_pricing(models) -> int`¶

Bulk-load pricing from a remote source. Merges into the registry, overriding existing entries. Returns the number of models loaded.

count = default_registry.load_remote_pricing({
    "new-model": (1e-06, 3e-06),
    "gpt-4o": (2.5e-06, 10e-06),  # override
})

`has_model(model) -> bool`¶

Check if a model has pricing registered (exact match only).

`list_models() -> list[str]`¶

Return all registered model names (sorted).

Model Resolution¶

The registry uses a two-tier lookup:

Exact match -- "gpt-4o" matches "gpt-4o" directly
Prefix match -- "gpt-4o-2024-08-06" matches "gpt-4o" via longest-prefix

Prefix matching uses boundary detection: the character after the prefix must be -, /, or end-of-string. This prevents "o3" from matching "o3000".

Namespace prefixes ending with / or - (e.g., "ollama/") always match any suffix.

Dynamic Cache¶

Prefix-resolved models are cached for subsequent lookups. The cache is bounded at 1,000 dynamic entries. When the limit is reached, the oldest dynamic entry is evicted. User-registered models (via register_model) are never evicted.

Constants¶

`COST_ROUND_PLACES`¶

from llm_toll.pricing import COST_ROUND_PLACES
# COST_ROUND_PLACES = 10

Number of decimal places to round costs to, preventing floating-point drift across many accumulated calls.

PricingRegistry¶

Import¶

Constructor¶

Methods¶

register_model(model, input_cost_per_token, output_cost_per_token)¶

set_fallback_pricing(input_cost_per_token, output_cost_per_token)¶

get_cost(model, input_tokens, output_tokens) -> float¶

load_remote_pricing(models) -> int¶

has_model(model) -> bool¶

list_models() -> list[str]¶