Usage Accounting | Track AI Model Usage with OpenRouter | OpenRouter

The OpenRouter API provides built-in Usage Accounting that allows you to track AI model usage without making additional API calls. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.

Usage Information

When enabled, the API will return detailed usage information including:

Prompt and completion token counts using the model’s native tokenizer
Cost in credits
Reasoning token counts (if applicable)
Cached token counts (if available)

This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.

Enabling Usage Accounting

You can enable usage accounting in your requests by including the usage parameter:

1 {
2   "model": "your-model",
3   "messages": [],
4   "usage": {
5     "include": true
6   }
7 }

Response Format

When usage accounting is enabled, the response will include a usage object with detailed token information:

1 {
2   "object": "chat.completion.chunk",
3   "usage": {
4     "completion_tokens": 2,
5     "completion_tokens_details": {
6       "reasoning_tokens": 0
7     },
8     "cost": 0.95,
9     "cost_details": {
10       "upstream_inference_cost": 19
11     },
12     "prompt_tokens": 194,
13     "prompt_tokens_details": {
14       "cached_tokens": 0
15     },
16     "total_tokens": 196
17   }
18 }

cached_tokens is the number of tokens that were read from the cache. At this point in time, we do not support retrieving the number of tokens that were written to the cache.

Cost Breakdown

The usage response includes detailed cost information:

cost: The total amount charged to your account
cost_details.upstream_inference_cost: The actual cost charged by the upstream AI provider

Note: The upstream_inference_cost field only applies to BYOK (Bring Your Own Key) requests.

Performance Impact

Enabling usage accounting will add a few hundred milliseconds to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.

Benefits

Efficiency: Get usage information without making separate API calls
Accuracy: Token counts are calculated using the model’s native tokenizer
Transparency: Track costs and cached token usage in real-time
Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens

Best Practices

Enable usage tracking when you need to monitor token consumption or costs
Account for the slight delay in the final response when usage accounting is enabled
Consider implementing usage tracking in development to optimize token usage before production
Use the cached token information to optimize your application’s performance

Alternative: Getting Usage via Generation ID

You can also retrieve usage information asynchronously by using the generation ID returned from your API calls. This is particularly useful when you want to fetch usage statistics after the completion has finished or when you need to audit historical usage.

To use this method:

Make your chat completion request as normal
Note the id field in the response
Use that ID to fetch usage information via the /generation endpoint

For more details on this approach, see the Get a Generation documentation.

Examples

Basic Usage with Token Tracking

1 import requests
2 import json
3 
4 url = "https://openrouter.ai/api/v1/chat/completions"
5 headers = {
6     "Authorization": f"Bearer {{API_KEY_REF}}",
7     "Content-Type": "application/json"
8 }
9 payload = {
10     "model": "{{MODEL}}",
11     "messages": [
12         {"role": "user", "content": "What is the capital of France?"}
13     ],
14     "usage": {
15         "include": True
16     }
17 }
18 
19 response = requests.post(url, headers=headers, data=json.dumps(payload))
20 print("Response:", response.json()['choices'][0]['message']['content'])
21 print("Usage Stats:", response.json()['usage'])

Streaming with Usage Information

This example shows how to handle usage information in streaming mode:

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://openrouter.ai/api/v1",
5     api_key="{{API_KEY_REF}}",
6 )
7 
8 def chat_completion_with_usage(messages):
9     response = client.chat.completions.create(
10         model="{{MODEL}}",
11         messages=messages,
12         usage={
13           "include": True
14         },
15         stream=True
16     )
17     return response
18 
19 for chunk in chat_completion_with_usage([
20     {"role": "user", "content": "Write a haiku about Paris."}
21 ]):
22     if hasattr(chunk, 'usage'):
23         if hasattr(chunk.usage, 'total_tokens'):
24             print(f"\nUsage Statistics:")
25             print(f"Total Tokens: {chunk.usage.total_tokens}")
26             print(f"Prompt Tokens: {chunk.usage.prompt_tokens}")
27             print(f"Completion Tokens: {chunk.usage.completion_tokens}")
28             print(f"Cost: {chunk.usage.cost} credits")
29     elif chunk.choices[0].delta.content:
30         print(chunk.choices[0].delta.content, end="")