Skip to content
Qwen 3.7 Plus and MiniMax M3 land on Vercel AI Gateway plus new Replicate and HF models | Daily AI roundup cover

Qwen 3.7 Plus and MiniMax M3 land on Vercel AI Gateway plus new Replicate and HF models

By Harsh Desai
Share

TL;DR

Vercel added two new agent-ready models to its gateway, Replicate and Hugging Face surfaced fresh image and multimodal checkpoints, and several infrastructure tweaks from Vercel and NVIDIA shipped for builders and operators.

What shipped

On 1 June new model endpoints and platform updates arrived across major hosts. Vercel, NVIDIA, Replicate and Hugging Face each released items aimed at agents, local inference and production reliability. The day’s releases give builders immediate options for vision-language agents and memory-safe builds.

Replicate new models

birgit-portrait: birgit-ploeger shipped birgit-portrait on Replicate for prompt-driven portrait generation. The model supports LoRA scaling and fast inference modes and runs through the public API or playground. Vibe builders can drop it into mockups or client demos today without managing weights.

Vendor launches

Vercel updated its AI Gateway with two new multimodal models and added OIDC and memory safeguards to its storage and build services. NVIDIA released four infrastructure announcements covering factory AI, cloud capacity and local agents on RTX hardware.

  • Qwen 3.7 Plus on AI Gateway Alibaba’s Qwen 3.7 Plus reached Vercel AI Gateway with unified vision-language agent skills for GUI, CLI and coding tasks. Set the model string to alibaba/qwen-3.7-plus in the AI SDK to test agent harnesses. Developers can route calls through one endpoint instead of managing separate providers.
  • Google I/O 2026 AI production notes Google published how internal teams used Gemini to create assets and logistics for Google I/O 2026. The post shows concrete prompt patterns for slide generation and scheduling. Event producers can copy the workflow for similar large-scale content tasks.
  • NVIDIA AI Cloud expansion NVIDIA announced new partner clouds purpose-built for high token demand from agentic applications. Enterprises and labs can choose regional capacity instead of building their own clusters. The move shortens time-to-compute for teams scaling inference workloads.
  • NVIDIA local agents on RTX and DGX Spark NVIDIA detailed open-source agent projects such as OpenClaw and Hermes that now run on-device. The agents handle multi-step tasks and app interaction without cloud calls. Developers testing local agents can benchmark against these examples on consumer RTX cards.
  • MiniMax M3 on AI Gateway MiniMax M3 joined Vercel AI Gateway with a 1M-token window and native multimodal input. Set the model string to minimax/minimax-m3 to run terminal tool use or web browsing agents. Builders gain another long-context option through the same unified SDK.

Hugging Face trending

Three models climbed the Hugging Face trending list: two image-text-to-text checkpoints and one text-to-image model. Each ships ready for download and local fine-tuning through the Hub.

  • Step-3.7-Flash-GGUF stepfun-ai published Step-3.7-Flash-GGUF, a GGUF-quantised image-text-to-text model now trending on the Hub. Users can download it for local inference or further fine-tuning. Prototype teams gain a fast multimodal checkpoint without custom quantisation work.
  • Keye-VL-2.0-30B-A3B Kwai-Keye released Keye-VL-2.0-30B-A3B, a 30B image-text-to-text model trending on Hugging Face. It runs via the transformers library and includes evaluation numbers on the model card. Researchers can compare it directly against other open vision-language baselines.
  • bonsai-image-ternary-4B-gemlite-2bit prism-ml launched bonsai-image-ternary-4B-gemlite-2bit, a 2-bit text-to-image model trending on the Hub. The diffusers integration lets users run it locally with low memory. Small-team creators can generate images on consumer GPUs without heavy infrastructure.

Product Hunt picks

Three tools appeared on Product Hunt: an API client for Figma mockups, an agent for long-running coding work, and a proxy that surfaces LLM spend. Each targets a narrow workflow pain point.

  • Mistral Vibe Mistral Vibe launched an agent built for multi-step coding and long-running tasks. Users assign it terminal or editor work that spans hours. Solo developers gain a persistent helper without stitching separate scripts.
  • Tokenwise Tokenwise introduced a proxy that flags over-spend across LLM calls and suggests cheaper routes. Teams connect it once and receive usage reports. Finance leads at small companies can cut inference bills without rewriting client code.

What this means for you

For Vibe Builders: You can now call Qwen 3.7 Plus and MiniMax M3 through one Vercel endpoint for agent tasks that mix vision and long context. The new Replicate portrait model and trending Hugging Face checkpoints give quick image and multimodal options you drop into prototypes. Product Hunt tools like Mistral Vibe and Tokenwise let you test persistent agents and cost tracking without writing glue code.

For Non-techies: SMB owners gain two new models on Vercel that handle image-plus-text work and long conversations, so customer support or content tools can process richer inputs. Local-agent updates from NVIDIA and the Mistral Vibe launch on Product Hunt point toward agents that run on existing laptops. Tokenwise shows exactly where spend is high so monthly AI bills stay predictable.

For Developers: Vercel’s AI Gateway now routes to Qwen 3.7 Plus and MiniMax M3 while its Blob and build services added OIDC and OOM guards. NVIDIA’s factory blueprint and RTX agent examples supply reference architectures for on-prem and edge deployments. Hugging Face trending models plus the three Product Hunt utilities give concrete items to benchmark against current stacks this week.

What to watch next

Watch for production numbers from the new Qwen and MiniMax endpoints on Vercel. Track whether NVIDIA’s FOX blueprint appears in partner case studies. Monitor the three trending Hugging Face models for community fine-tunes and usage benchmarks.

Harshs take

The day’s biggest movement is the addition of two long-context multimodal models to Vercel’s gateway while NVIDIA pushes reference designs for both factory floors and local PCs. The pattern shows platforms racing to own the agent runtime layer rather than just the model weights. Builders should pick one new gateway model and one local checkpoint, wire them into the same small workflow, and measure latency and cost against their current stack before the next wave of releases lands.

by Harsh Desai

More AI news

Everything AI. One email.
Every Monday.

New tools. Model launches. Plugins. Repos. Tactics. The moves the sharpest builders are making right now, before everyone else.

No spam. Unsubscribe anytime.