Gemini¶

Auto-Detection¶

The Gemini parser detects responses by checking for usage_metadata with prompt_token_count and candidates_token_count attributes (duck-typing). It works with:

google.genai.GenerateContentResponse objects
Any object with a compatible usage_metadata attribute structure

Supported Models¶

Model	Input Cost (per token)	Output Cost (per token)
gemini-1.5-pro	$1.25e-06	$5.00e-06
gemini-1.5-flash	$7.50e-08	$3.00e-07
gemini-2.0-flash	$1.00e-07	$4.00e-07
gemini-2.5-pro	$1.25e-06	$1.00e-05
gemini-2.5-flash	$1.50e-07	$6.00e-07

Basic Usage¶

from google import genai
from llm_toll import track_costs

client = genai.Client()

@track_costs(project="my_app", max_budget=5.00)
def chat(prompt):
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt,
    )
    return response

Streaming¶

@track_costs(project="my_app")
def stream_chat(prompt):
    return client.models.generate_content_stream(
        model="gemini-2.5-flash",
        contents=prompt,
    )

for chunk in stream_chat("Hello"):
    print(chunk.text, end="")

The stream accumulator extracts:

Text from candidates[0].content.parts[0].text
Model version from chunk.model_version
Token counts from usage_metadata.prompt_token_count and candidates_token_count

Async¶

@track_costs(project="my_app")
async def async_chat(prompt):
    response = await client.aio.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt,
    )
    return response

Response Format¶

The parser expects this structure:

response.model_version                          # str (optional)
response.usage_metadata.prompt_token_count      # int
response.usage_metadata.candidates_token_count  # int

Model Name Extraction¶

Gemini responses carry the model version in model_version rather than model. The parser uses this when available. For streaming, the model version may not appear in every chunk.