Gemini¶
Auto-Detection¶
The Gemini parser detects responses by checking for usage_metadata with prompt_token_count and candidates_token_count attributes (duck-typing). It works with:
google.genai.GenerateContentResponseobjects- Any object with a compatible
usage_metadataattribute structure
Supported Models¶
| Model | Input Cost (per token) | Output Cost (per token) |
|---|---|---|
| gemini-1.5-pro | $1.25e-06 | $5.00e-06 |
| gemini-1.5-flash | $7.50e-08 | $3.00e-07 |
| gemini-2.0-flash | $1.00e-07 | $4.00e-07 |
| gemini-2.5-pro | $1.25e-06 | $1.00e-05 |
| gemini-2.5-flash | $1.50e-07 | $6.00e-07 |
Basic Usage¶
from google import genai
from llm_toll import track_costs
client = genai.Client()
@track_costs(project="my_app", max_budget=5.00)
def chat(prompt):
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=prompt,
)
return response
Streaming¶
@track_costs(project="my_app")
def stream_chat(prompt):
return client.models.generate_content_stream(
model="gemini-2.5-flash",
contents=prompt,
)
for chunk in stream_chat("Hello"):
print(chunk.text, end="")
The stream accumulator extracts:
- Text from
candidates[0].content.parts[0].text - Model version from
chunk.model_version - Token counts from
usage_metadata.prompt_token_countandcandidates_token_count
Async¶
@track_costs(project="my_app")
async def async_chat(prompt):
response = await client.aio.models.generate_content(
model="gemini-2.5-flash",
contents=prompt,
)
return response
Response Format¶
The parser expects this structure:
response.model_version # str (optional)
response.usage_metadata.prompt_token_count # int
response.usage_metadata.candidates_token_count # int
Model Name Extraction¶
Gemini responses carry the model version in model_version rather than model. The parser uses this when available. For streaming, the model version may not appear in every chunk.