Add Gemini text-to-speech support
TL;DR
Integrate Gemini TTS into the bundled Google plugin, supporting voice selection, WAV/PCM output, and automated setup guidance.
## What changed OpenClaw added Gemini text-to-speech support to its bundled Google plugin on 18 May 2026. The update lets users pick from Gemini voices and output audio as WAV or PCM files. Automated setup guidance appears in the CLI during configuration.
The change ships as a free update for all self-hosted installs. No new pricing tier is required.
## Why it matters Voice output turns OpenClaw from a text-only responder into an agent that can deliver briefings or alerts as spoken audio inside the same messaging apps users already check. This reduces context switching for Vibe Builders who run the agent on WhatsApp or Telegram.
The move pressures closed cloud agents to match multimodal output quickly. It also bets that local control over voice generation will matter more than raw model quality for daily personal-assistant tasks.
## How to use it Pull the latest OpenClaw release through the existing CLI command. Run the Google plugin setup again and select Gemini as the TTS provider when prompted. Choose a voice name and set the output format to WAV or PCM in the YAML file.
Restart the agent. Test by sending a message that triggers a spoken reply. The feature works on any plan that already connects to Google models.
## Watch for Confirm the bet if users report reliable voice check-ins without extra token spend. Watch for quality drops or sudden API rate limits that break scheduled audio briefings. Expect a follow-up integration with at least one non-Google TTS option within the next quarter.
Harsh’s take
Gemini TTS inside OpenClaw gives a solo operator spoken reminders without opening another app, yet it still routes every word through Google's paid endpoint. The real trade-off is added latency and cost versus the simplicity of text-only heartbeats that already work.
Most Vibe Builders will hit the same wall they always do with self-hosted agents: setup friction plus unpredictable bills once audio generation starts running daily. Skip the feature if your current text flow already covers the job.
Update the Google plugin config this week and cap daily TTS calls at a fixed token budget before you turn it on.
by Harsh Desai
About OpenClaw
View the full OpenClaw page →All OpenClaw updatesMore from OpenClaw
- FeatureExpand QA-Lab with runtime parity scenarios
Added comprehensive runtime parity tiers and token-efficiency artifacts to the QA-Lab, including specific checks for Codex-vs-Pi compatibility and tool fixture coverage.
- App UpdateUpdate Node.js requirement and Pi packages
Raised the minimum supported Node.js version to 22.19 and updated Pi packages to version 0.75.1 to ensure compatibility with the latest runtime features.
- App UpdateOptimize Gateway startup and restart latency
Reduced restart ready latency by overlapping startup logging and plugin-service initialization with channel sidecars while maintaining strict readiness gating.