Gemma 4 12B and Grok Imagine Video 1.5 debut, plus Hugging Face image and video models
TL;DR
Google and xAI pushed new multimodal and video models while Hugging Face highlighted fresh text-to-image and video options; developers and builders gain more local and hosted tools for generation and editing tasks.
What shipped
On 3 June several major vendors released updated models and tools focused on image, video, and multimodal capabilities. Google led with a new 12B parameter model and search features, while xAI and Runway added video generation updates. Trending entries on Hugging Face and new apps on Product Hunt show continued movement toward accessible generation and agent-style workflows.
Hugging Face trending
Three models from different labs rose on the Hub, spanning text-to-image, any-to-any, and image-text-to-video tasks. Ideogram AI, Google, and ByteDance each placed one entry, giving users direct download and fine-tuning paths via standard libraries. These releases expand options for local experimentation without new infrastructure.
- •Ideogram 4 FP8 Ideogram AI placed its Ideogram 4 FP8 text-to-image model at the top of Hugging Face trends. The model supports fine-tuning and inference through the Hub, letting users generate custom images faster than many prior open checkpoints.
- •Gemma 4 12B IT Google released Gemma 4 12B IT, an any-to-any model trending on the Hub. Built for transformers, it allows direct download and fine-tuning for multimodal tasks on modest hardware.
- •Bernini R ByteDance added Bernini R, an image-text-to-video model now trending on Hugging Face. Researchers and builders can download it to create short video clips from combined image and text prompts.
Vendor launches
Google supplied the largest share of updates with a multimodal model, consumer apps, and search controls for site owners. xAI added a video model to Vercel infrastructure while NVIDIA focused on physical AI skills for robotics and vehicles. The combined releases emphasize both consumer-facing generation and research workflows.
- •Grok Imagine Video 1.5 xAI launched Grok Imagine Video 1.5 on AI Gateway for single-pass image-to-video generation with audio. The update improves character consistency and lighting, giving creators longer clips with better prompt adherence than the prior version.
- •Dreambeans Google introduced Dreambeans, an app that uses its latest models to curate daily stories based on user interests. SMB owners can test it to surface relevant content without manual curation.
- •Gemma 4 12B Google released Gemma 4 12B, a unified multimodal model designed to run on laptops. Developers gain an encoder-free option for local multimodal work that competes with larger hosted systems.
- •NVIDIA Physical AI Skills NVIDIA unveiled new agent skills at CVPR for autonomous vehicles, robotics, and vision AI. Researchers can now use the tools to reconstruct scenes and train policies at scale.
- •NVIDIA Grasping Research NVIDIA published advances in robotic grasping and agent training that handle novel objects. The work targets practical deployment in warehouses and driving systems.
- •Google Search Thrifting Google added AI features in Search and Shopping to help users find second-hand items faster. Vintage sellers and buyers receive direct suggestions without separate apps.
- •Search Controls for Owners Google released new tools that let website owners manage how their content appears in AI search results. Publishers gain options to limit or shape AI summaries.
Replicate new models
Aleph 2: Runway released Aleph 2 on Replicate for editing entire videos from single-frame changes. Users can now handle up to 30-second clips with keyframe references, reducing manual re-rendering time.
Product Hunt picks
Nine tools appeared on Product Hunt, covering agent harnesses, document templates, brand controls, and local chat apps. Several entries target faster iteration for builders and teams working with coding agents or content generation.
- •Composer A multiplayer markdown editor arrived that supports teams and agents editing the same document. Writers and small teams can collaborate in real time with AI assistance.
- •Replicas The service lets users run coding agent harnesses in the cloud. Developers avoid local setup when testing multiple agent frameworks.
- •Dropstone 1.5 A new plan offers twice the usage of Claude Code Pro for a fixed monthly fee. Heavy users can cut costs while keeping the same model access.
- •Carbone Skill for AI A skill teaches AI systems to generate document templates on demand. Office teams reduce repetitive formatting work.
- •Handler The tool presents AI edits as stacked pull requests for review before merge. Teams gain clearer oversight of generated code changes.
- •Brand Context API Brandfetch released an API that keeps AI outputs aligned with brand guidelines. Marketing teams can enforce voice and visual rules at generation time.
- •EchoFlow A native Android chat app stores all conversations locally. Users who want on-device privacy gain an offline alternative to cloud services.
- •Hermes Desktop An agent desktop app scales with user needs over time. Individuals can start simple and add capabilities without switching platforms.
What this means for you
For Vibe Builders: You can now pull trending image and video models from Hugging Face and test them locally or via Replicate without writing new code. Google and xAI releases give you ready-made video generation and story curation that plug into existing workflows. Product Hunt tools such as Handler and Brand Context API let you review edits and keep outputs on-brand with minimal setup.
For Non-techies: Google added Search tools that surface second-hand finds and new controls for site owners, while Dreambeans offers daily curated stories. These changes mean your business content can appear or be limited in AI results, and you can test simple apps that reduce manual curation work.
For Developers: Gemma 4 12B ships as a laptop-ready multimodal model while NVIDIA released physical AI skills and grasping benchmarks at CVPR. Runway's Aleph 2 on Replicate and the Replicas cloud harness give you concrete options to benchmark local versus hosted video editing and agent runs before integrating them into production pipelines.
What to watch next
Track adoption numbers for Gemma 4 12B on Hugging Face and any follow-up benchmarks from NVIDIA on grasping tasks. Watch for expanded support of Grok Imagine Video 1.5 on additional gateways and new Product Hunt entries that extend agent review flows.
Harsh’s take
The day shows a split between flashy consumer video tools and narrower research releases that still require engineering effort to use. Google dominates volume yet most of its items target search or light apps rather than core model capability jumps. Builders should pick one new model from the Hugging Face list and run a short fine-tune this week to test whether the claimed quality holds on their own data before committing to any vendor stack.
by Harsh Desai
Sources
Hugging Face trending
- •ideogram-4-fp8 by ideogram-ai trends on HuggingFace
- •gemma-4-12B-it by google trends on HuggingFace
- •Bernini-R by ByteDance trends on HuggingFace
Vendor launches
- •Grok Imagine Video 1.5 on AI Gateway
- •Meet Dreambeans, an app that connects you with what matters
- •Introducing Gemma 4 12B: a unified, encoder-free multimodal model
- •NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI
- •NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale
- •5 ways Google Search can level up your thrift and vintage shopping
- •Alphabet investor presentation: June 2026
- •New opportunities, control and insights for website owners
Replicate new models
Product Hunt picks
More AI news
- Daily RoundupNVIDIA Jetson and NemoClaw push agentic AI, Grok video on Fal, plus new builder tools
NVIDIA expanded agentic AI from industrial workflows to physical devices and cloud stacks while Vercel added signed URLs, trending models appeared on Hugging Face, Grok Imagine Video launched on Fal, and two new utilities hit Product Hunt.
- Daily RoundupQwen 3.7 Plus and MiniMax M3 land on Vercel AI Gateway plus new Replicate and HF models
Vercel added two new agent-ready models to its gateway, Replicate and Hugging Face surfaced fresh image and multimodal checkpoints, and several infrastructure tweaks from Vercel and NVIDIA shipped for builders and operators.
- Weekly DigestThe best AI GitHub repos right now: May 2026 edition
The strongest AI and developer GitHub repos our directory tracks as of May 2026, ranked by stars, forks and editorial signal. Picks span coding assistants, MCP servers, and AI frameworks.