Vercel adds agent runs and FUSE support, Microsoft ships AutoPilot agents, and benchmark rethink for builders
TL;DR
Vercel rolled out deeper agent inspection and filesystem tools while Microsoft merged Copilot into a super app; fresh benchmarks show agents are stronger than prior tests indicated.
What shipped
On 3 July several practical AI tooling updates landed for builders and teams. Vercel extended its platform with agent tracing and sandbox mounts. Microsoft and benchmark researchers also moved the conversation from chat interfaces toward measurable agent performance.
Vendor launches
Vercel released three updates focused on agent workflows and deployment control. The changes center on tracing, storage mounts, and flag management through existing CLI and MCP surfaces. These tools target teams already shipping on the platform rather than new entrants.
- •Agent Runs in Vercel MCP and CLI Vercel now lets users query Agent Runs directly from its MCP and CLI for the eve framework. Teams can list projects, pull recent runs, and inspect full traces with reasoning steps and token counts after deploying to Vercel.
- •Vercel Sandbox FUSE support Vercel Sandbox added FUSE mounts so users can attach S3 buckets or network filesystems as local paths. This removes the need to copy large datasets into the sandbox before running tools that expect standard file paths.
- •Vercel Flags segments via CLI The new vercel flags segments command lets teams edit targeting rules from the terminal or CI scripts. Users can add or remove include and exclude tokens or replace entire segment definitions with JSON.
Product Hunt picks
Four new tools appeared on Product Hunt aimed at desktop and workflow automation. They range from Mac app builders to issue reproduction agents. Most target individual users or small teams looking for quick AI helpers rather than enterprise stacks.
- •Glaze by Raycast Glaze turns chat prompts into standalone Mac apps without writing code. Users describe the app and the tool generates a working version for local use.
- •Tamamon Tamamon creates a desktop pet that levels up based on time spent in Claude Code. The pet serves as a visual progress tracker for solo developers.
- •Osloq Osloq is an agent that opens GitHub issues and attempts to reproduce them locally. It reports steps and failures back to the user for faster triage.
- •Goals from Loops Goals from Loops tracks whether marketing campaigns hit intended business outcomes. It pulls data from existing tools and surfaces simple success metrics.
Industry news
New data from the UK's AI Security Institute showed that higher token budgets lift agent success rates by roughly 25 percent on software tasks. Microsoft is merging its Copilot apps and adding paid AutoPilot agents. A developer course author also noted falling sales linked to AI job concerns.
- •UK AISI benchmark study The UK's AI Security Institute found that standard tests understate agent ability when token limits stay low. Raising the budget by ten times improved software engineering results by about 25 percent, with newer models gaining the most.
- •Browser alternatives overview TechCrunch listed current options challenging Chrome and Safari, including several that emphasize local AI features or privacy controls over search defaults.
- •Microsoft Copilot overhaul Microsoft plans to combine consumer and enterprise Copilot into one app in August and introduce background AutoPilot agents for an added fee. The move follows similar shifts at Anthropic and OpenAI.
- •AI glossary from TechCrunch TechCrunch published a reference list of current AI terms to help readers parse announcements without prior jargon knowledge.
Hugging Face trending
Two new papers on Hugging Face explore better camera motion and viewpoint handling for vision-language-action models in robotics. Both focus on closing the gap between simulated training and real-world spatial tasks.
- •LIME paper LIME trains models to predict useful camera movements from egocentric video before acting. The approach aims to improve intent-aware navigation for robots that must inspect objects first.
- •Moving Eye paper The Moving Eye work tests hybrid data collection to reduce shortcut learning in VLA models. It shows that simply adding more viewpoints is not enough for reliable spatial generalization.
What this means for you
For Vibe Builders: You can now inspect agent traces and mount external storage inside Vercel Sandboxes without extra setup steps. Product Hunt tools like Glaze and Osloq let you test small agents for Mac apps or GitHub work in minutes. Watch the AISI benchmark update to see which models actually improve when given more tokens before you chain them into your own workflows.
For Non-techies: Microsoft is folding its Copilot tools into one app with paid background agents that can run tasks for you. Desktop options on Product Hunt such as Tamamon or Goals from Loops offer simple tracking without new logins. The UK study on agent benchmarks suggests current demos may understate what these tools can already handle in daily use.
For Developers: Vercel CLI and MCP commands now expose full Agent Runs traces and FUSE mounts for production debugging. The AISI findings indicate you should retest your agent setups with higher token budgets before trusting older benchmark numbers. Microsoft AutoPilot agents and the two new VLA papers on Hugging Face give concrete signals on where runtime and spatial reliability are heading next.
What to watch next
Track whether Microsoft releases the merged Copilot app on schedule in August. Check for follow-up AISI guidance on token budgets for agent evals. Watch Hugging Face for more VLA spatial generalization results in the coming days.
Harsh’s take
The day showed steady platform extensions rather than model leaps. Vercel and Microsoft both doubled down on agent surfaces that still require careful prompting and monitoring. The AISI study exposed how weak current evaluations remain, which means many production claims rest on shaky ground.
A second-order effect is that builders may over-invest in tooling that looks capable only because benchmarks were capped. Re-running key tasks with larger budgets should become standard before any rollout.
This week, pick one agent workflow you already run and double its token allowance, then measure the actual change in success rate.
by Harsh Desai
Sources
Vendor launches
- •Agent Runs now available in the Vercel MCP and CLI
- •Vercel Sandbox now supports FUSE-based filesystems
- •Manage Vercel Flags segments with Vercel CLI
Product Hunt picks
Industry news
- •A device that revives eyeballs from dead donors could make eye transplants possible
- •Google DeepMind Unionization Talks Are Off to a Rocky Start
- •UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do
- •The browser wars aren’t about search anymore: here are the best alternatives to Chrome and Safari
- •Microsoft follows Anthropic and OpenAI into the AI super app race with overhauled Copilot and AutoPilot agents
- •The only AI glossary you’ll need this year
- •Quoting Josh W. Comeau
Hugging Face trending
More AI news
- Weekly DigestCursor iOS beta and cloud agents, Claude Sonnet 5 default, Codex Remote GA (agent tools for builders)
From 26 June to 3 July 2026 Cursor expanded cloud agent controls and iOS beta access while Claude Code and OpenAI Codex released CLI fixes, model defaults, and remote management features for faster daily workflows.