xAI ships Grok 4.3 Beta with video understanding, slide creation, and new Speech APIs
TL;DR
xAI quietly launched Grok 4.3 Beta on April 17, adding native video understanding, AI slide creation, and new Speech-to-Text and Text-to-Speech APIs. The release came without an official xAI blog post or published model card.
What changed
What shipped
Grok 4.3 Beta soft-launched on April 17, 2026. Three new capability layers:
1. Native video understanding. Grok can now watch and analyse video directly, not just extract frames. Useful for tutorials, long-form content analysis, and video question-answering.
2. AI slide creation. Generate slide decks from prompts or source documents. Positioned against ChatGPT's slide features and Microsoft Copilot's presentation generation.
3. Speech APIs. Speech-to-Text and Text-to-Speech endpoints on the xAI API. Competitive with ElevenLabs and OpenAI's voice surfaces for developers building voice-first apps.
Release pattern
Unusually for a frontier model release, Grok 4.3 Beta arrived without:
- •An xAI blog post announcing it.
- •A published model card.
- •Third-party benchmark results on launch day.
Access appeared quietly through the Grok app and API. Developers noticed the new model ID in the API options dropdown; independent coverage surfaced it within hours.
Access
Grok 4.3 Beta is available to Grok subscribers on the web, mobile apps, and through the xAI API. Treat beta as "functional but subject to change"; xAI has historically iterated rapidly on beta models before a formal full release.
Positioning
The capability bundle (video, slides, speech) suggests Grok is widening beyond pure chat to compete with multimodal-first assistants like Gemini and ChatGPT. The quiet launch is consistent with xAI's recent pattern of shipping features first and formalising the announcement later.
Who this matters for
- Vibe Builder: Ask Grok to generate a slide deck from your rough notes. Video Q&A is interesting if you work with tutorials or long-form content regularly.
- Basic User: If you are already a Grok subscriber, you have access to video understanding and slide generation at no extra cost. Try video summarisation first.
- Developer: Speech-to-Text and Text-to-Speech APIs drop xAI into the voice-app market alongside ElevenLabs and OpenAI. Evaluate pricing and latency against your current voice provider.
What to watch next
The quiet launch pattern is the interesting part. Most AI labs treat a new model as a marketing moment: blog post, model card, benchmark tables, launch partners. xAI is doing the opposite with Grok 4.3: ship the capability, let developers find it, let the usage data speak before the marketing.
There is a pragmatic reason. Benchmark gaming has gotten bad enough in 2026 that early benchmark scores are often the least useful information about a model. By the time independent evals catch up two weeks later, the narrative is already set. xAI may be betting that usage-driven word-of-mouth from actual developers is worth more than day-one benchmark theatrics.
For anyone evaluating Grok 4.3 Beta right now: the video understanding is the most differentiated feature in the bundle. Slide generation and speech APIs have three or four other strong players already; video understanding in the API has fewer credible options. If your use case is "analyse video programmatically," this is worth a test.
by Harsh Desai