Improve OpenAI streaming usage reporting
TL;DR
Always send `stream_options.include_usage` on streaming requests to ensure local and custom OpenAI-compatible backends report accurate context usage instead of defaulting to 0%.
## What changed OpenClaw now forces stream_options.include_usage on every streaming request to OpenAI and OpenAI-compatible endpoints. The update shipped May 19 2026. Local and custom backends that previously returned zero context usage now report actual token counts.
No user configuration is required. The flag is added automatically in the request body for all streaming calls.
## Why it matters Accurate usage data removes a common blind spot for builders who route traffic between paid APIs and self-hosted models. Without the flag, token tracking stayed broken and bills could jump without clear signals.
The change strengthens OpenClaw's position as a reliable daily driver for mixed-provider setups. It pressures other agent tools to fix the same reporting gap or accept higher support load from confused users.
## How to use it Pull the latest OpenClaw release through the existing CLI command. Restart the agent and send a test streaming prompt to any connected backend. Check the response metadata or provider dashboard for non-zero usage values.
The fix works on current stable versions and requires no YAML edits or new environment variables.
## Watch for Stable usage numbers across Ollama, LM Studio, and custom proxies will confirm the fix holds. A provider that still ignores the flag or returns inflated counts would break the improvement. Expect similar default flags for other streaming options in the next release cycle.
Harsh’s take
This patch removes one source of hidden cost for anyone running OpenClaw with multiple backends. The real trade-off is that every streaming call now carries a small extra payload, which adds up if you keep high-frequency heartbeats active.
Solo operators who treat OpenClaw as a 24/7 assistant should treat the update as a reminder that token visibility is still their responsibility, not the tool's. Set provider-side hard limits today rather than waiting for the next surprise invoice.
Do this now: review the last 30 days of usage logs from every connected model and add alerts at 70 percent of your monthly budget.
by Harsh Desai
About OpenClaw
View the full OpenClaw page →All OpenClaw updatesMore from OpenClaw
- FeatureExpand QA-Lab with runtime parity scenarios
Added comprehensive runtime parity tiers and token-efficiency artifacts to the QA-Lab, including specific checks for Codex-vs-Pi compatibility and tool fixture coverage.
- App UpdateUpdate Node.js requirement and Pi packages
Raised the minimum supported Node.js version to 22.19 and updated Pi packages to version 0.75.1 to ensure compatibility with the latest runtime features.
- App UpdateOptimize Gateway startup and restart latency
Reduced restart ready latency by overlapping startup logging and plugin-service initialization with channel sidecars while maintaining strict readiness gating.