bytedance/UI-TARS Desktop
The Open-Source Multimodal AI Agent Stack: Connecting Leading AI Models and Agent Infra
UI-TARS Desktop is ByteDance's open-source stack for multimodal AI agents that see and control a real computer screen to operate apps and browsers. It pairs the UI-TARS GUI-agent models with Agent TARS, a desktop app and infrastructure for building computer-use and browser-use agents.
Our Review
From ByteDance, UI-TARS Desktop (now branded Agent TARS) has reached 35,000 GitHub stars as one of the most capable open-source takes on computer-use agents. The idea is simple but hard: a vision-language model looks at the actual screen, decides where to click and type, and drives software the way a person would, with no per-app API needed.
What UI-TARS Desktop does:
- •Vision-based computer use a vision-language model reads the screen and controls the mouse and keyboard to operate desktop apps.
- •Browser automation drive web tasks through a real browser, navigating and acting like a human user.
- •UI-TARS agent models built around ByteDance's open UI-TARS GUI-agent models, tuned for on-screen action.
- •Agent TARS app a desktop application to run, watch, and direct these agents on your own machine.
- •MCP support connect external tools and an MCP server to extend what the agent can do.
- •Open-source stack models, app, and agent infrastructure released under Apache-2.0 for self-hosting and research.
Getting started:
Download Agent TARS from the releases or agent-tars.com, or build from the repo. Connect a UI-TARS or compatible vision model, grant screen and input access, and give it a task. Docs are linked from the repo.
Limitations:
Computer-use agents are still early in 2026: they can misclick, get confused by unfamiliar interfaces, and need supervision, so this is closer to a powerful experiment than a hands-off product. Giving an AI control of your mouse, keyboard, and browser carries real security and privacy risk, so run it in a sandbox for anything sensitive. It is a developer-oriented stack, and performance depends on the vision model you run.
Our Verdict
UI-TARS Desktop is one of the most serious open-source efforts at computer-use AI in 2026. If you want to experiment with an agent that actually sees your screen and operates apps and browsers, rather than calling APIs, ByteDance's stack is a leading place to start, with 35,000 stars and an Apache-2.0 license.
For developers, the appeal is an open, self-hostable alternative to closed computer-use products: bring the UI-TARS models or another vision model, run the Agent TARS app locally, and extend it with MCP tools. It is research-grade, so expect to tune and supervise, but you keep full control of the model and the machine.
Skip UI-TARS Desktop if you need a reliable, production-ready automation tool today; for deterministic browser tasks, a structured tool like Playwright MCP is steadier than a vision agent. If you want a managed computer-use experience, a hosted assistant is less to run and supervise.
Frequently Asked Questions
What is UI-TARS Desktop?
UI-TARS Desktop, now branded Agent TARS, is an open-source multimodal AI agent stack from ByteDance. It uses vision-language models to look at your computer screen and control the mouse, keyboard, and browser, so an agent can operate real applications the way a person does. It includes the UI-TARS models, a desktop app, and supporting agent infrastructure.
Is UI-TARS Desktop free and open source?
Yes. UI-TARS Desktop is released under the Apache-2.0 license and is free and open source as of 2026. The desktop app and agent stack cost nothing to use. Your costs come from the vision-language model you run behind it, whether that is a hosted model or local hardware to run the open UI-TARS models.
What can UI-TARS Desktop automate?
UI-TARS Desktop is built for on-screen and browser automation. It can navigate websites, fill forms, click through desktop applications, and carry out multi-step tasks by perceiving the interface visually rather than relying on each app's API. Because it acts through the GUI, it can drive software that has no automation interface, though it still needs supervision.
Is it safe to let UI-TARS Desktop control my computer?
Use caution. As of 2026, giving any AI control of your mouse, keyboard, and browser carries real security and privacy risk, since a mistaken or manipulated action can affect real accounts and files. The safest approach is to run it in a sandbox or a dedicated environment, limit its access, and supervise tasks that touch sensitive data.
How is UI-TARS Desktop different from Playwright MCP?
Playwright MCP drives a browser deterministically through the accessibility tree and structured commands. UI-TARS Desktop uses a vision-language model to act on what it sees on screen, including desktop apps beyond the browser. Choose Playwright MCP for reliable, scripted browser tasks; choose UI-TARS Desktop to experiment with general computer-use agents that perceive the screen.
How do I install UI-TARS Desktop?
Visit the GitHub repository at https://github.com/bytedance/UI-TARS-desktop for installation instructions.
What license does UI-TARS Desktop use?
UI-TARS Desktop uses the Apache-2.0 license.
What are alternatives to UI-TARS Desktop?
Explore related tools and alternatives on My AI Guide.
Open source & community-verified
Apache-2.0 licensed: free to use in any project, no strings attached. 35,793 developers have starred this, meaning the community has reviewed and trusted it.
Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.
Topics