xAI releases Grok 4.3 Beta with video understanding, slide creation, Speech APIs
TL;DR
You can analyze videos, generate slides, and use new Speech-to-Text and Text-to-Speech APIs in Grok 4.3 Beta, available since April 17.
What shipped
Grok 4.3 Beta soft-launched on April 17, 2026. Three new capability layers:
- Native video understanding. Grok can now watch and analyse video directly, not just extract frames. Useful for tutorials, long-form content analysis, and video question-answering.
- AI slide creation. Generate slide decks from prompts or source documents. Positioned against ChatGPT's slide features and Microsoft Copilot's presentation generation.
- Speech APIs. Speech-to-Text and Text-to-Speech endpoints on the xAI API. Competitive with ElevenLabs and OpenAI's voice surfaces for developers building voice-first apps.
Release pattern
Unusually for a frontier model release, Grok 4.3 Beta arrived without:
- •An xAI blog post announcing it.
- •A published model card.
- •Third-party benchmark results on launch day.
Access appeared quietly through the Grok app and API. Developers noticed the new model ID in the API options dropdown; independent coverage surfaced it within hours.
Access
Grok 4.3 Beta is available to Grok subscribers on the web, mobile apps, and through the xAI API. Treat beta as "functional but subject to change"; xAI has historically iterated rapidly on beta models before a formal full release.
Positioning
The capability bundle (video, slides, speech) suggests Grok is widening beyond pure chat to compete with multimodal-first assistants like Gemini and ChatGPT. The quiet launch is consistent with xAI's recent pattern of shipping features first and formalising the announcement later.
Who this matters for
- Vibe Builder: Ask Grok to generate a slide deck from your rough notes. Video Q&A is interesting if you work with tutorials or long-form content regularly.
- Basic User: If you are already a Grok subscriber, you have access to video understanding and slide generation at no extra cost. Try video summarisation first.
- Developer: Speech-to-Text and Text-to-Speech APIs drop xAI into the voice-app market alongside ElevenLabs and OpenAI. Evaluate pricing and latency against your current voice provider.
Harsh’s take
The quiet launch pattern is the interesting part. Most AI labs treat a new model as a marketing moment: blog post, model card, benchmark tables, launch partners. xAI is doing the opposite with Grok 4.3: ship the capability, let developers find it, let the usage data speak before the marketing.
There is a pragmatic reason. Benchmark gaming has gotten bad enough in 2026 that early benchmark scores are often the least useful information about a model. By the time independent evals catch up two weeks later, the narrative is already set. xAI may be betting that usage-driven word-of-mouth from actual developers is worth more than day-one benchmark theatrics.
For anyone evaluating Grok 4.3 Beta right now: the video understanding is the most differentiated feature in the bundle. Slide generation and speech APIs have three or four other strong players already; video understanding in the API has fewer credible options. If your use case is "analyse video programmatically," this is worth a test.
by Harsh Desai
About Grok
View the full Grok page →All Grok updatesMore from Grok
- FeaturexAI launches Grok custom voices: clone your speech with just one minute of audio
xAI launches Custom Voices for Grok. It clones voices from one minute of speech atop recent Speech-to-Text and Text-to-Speech APIs.
- FeatureElon Musk testifies that xAI trained Grok on OpenAI models
Elon Musk testified that xAI trained Grok on OpenAI models. Model distillation draws scrutiny as labs block rivals from copying frontier AI.
- PartnershipPentagon adds xAI's Grok to GenAI.mil for DoD personnel
The US Department of Defense announced on April 22 that xAI's Grok will join the GenAI.mil service in early 2026 at IL5 certification, with 3M personnel and bundled real-time X platform access.