ByteDance: UI-TARS 7B

$0.0004/1k

$0.0008/1k

bytedance/ui-tars-1.5-7b

上下文长度: 128,000 text+image->text Other 2025-07-23 更新

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

模型参数

架构信息

模态: text+image->text

Tokenizer: Other

限制信息

上下文长度: 128,000

最大回复长度: 2,048

ByteDance: UI-TARS 7B

模型参数

架构信息

限制信息

相关模型

Z.ai: GLM 5V Turbo

Z.ai: GLM 5 Turbo

Z.ai: GLM 5

Z.ai: GLM 4.7 Flash

Z.ai: GLM 4.7

Z.ai: GLM 4.6V