Usage Accounting
The OpenRouter API provides built-in Usage Accounting that allows you to track AI model usage without making additional API calls. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.
Usage Information
When enabled, the API will return detailed usage information including:
- Prompt and completion token counts using the model’s native tokenizer
- Cost in credits
- Reasoning token counts (if applicable)
- Cached token counts (if available)
This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.
Enabling Usage Accounting
You can enable usage accounting in your requests by including the usage
parameter:
Response Format
When usage accounting is enabled, the response will include a usage
object with detailed token information:
Performance Impact
Enabling usage accounting will add a few hundred milliseconds to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.
Benefits
- Efficiency: Get usage information without making separate API calls
- Accuracy: Token counts are calculated using the model’s native tokenizer
- Transparency: Track costs and cached token usage in real-time
- Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens
Best Practices
- Enable usage tracking when you need to monitor token consumption or costs
- Account for the slight delay in the final response when usage accounting is enabled
- Consider implementing usage tracking in development to optimize token usage before production
- Use the cached token information to optimize your application’s performance
Alternative: Getting Usage via Generation ID
You can also retrieve usage information asynchronously by using the generation ID returned from your API calls. This is particularly useful when you want to fetch usage statistics after the completion has finished or when you need to audit historical usage.
To use this method:
- Make your chat completion request as normal
- Note the
id
field in the response - Use that ID to fetch usage information via the
/generation
endpoint
For more details on this approach, see the Get a Generation documentation.
Examples
Basic Usage with Token Tracking
Streaming with Usage Information
This example shows how to handle usage information in streaming mode: