Researchers have developed ReVision, a method that cuts token usage in computer-use agents by 46% while boosting success rates by 3% across three industry benchmarks.
Computer-use agents rely on screenshots to navigate graphical interfaces, but each image generates thousands of visual tokens. As interaction histories grow longer, token costs spiral upward, forcing developers to limit how much context their agents can process.
The ReVision approach trains multimodal language models to identify and remove redundant visual patches between consecutive screenshots. A learned patch selector compares representations across frames while preserving the spatial structure that models need for accurate navigation.
Performance gains across benchmarks
Testing on OSWorld, WebTailBench, and AgentNetBench using Qwen2.5-VL-7B, ReVision processed trajectories with five history screenshots more efficiently than baseline methods. The technique maintained spatial coherence while eliminating visual redundancy that typically bloats token counts.
The research team found that performance continued improving as more historical observations were incorporated when redundancy was removed. This challenges the common assumption that visual history reaches a saturation point due to limited usefulness.
Instead, the findings suggest that apparent saturation stems from inefficient token representations rather than inherent limitations of historical context. The method enables agents to process longer interaction sequences within fixed compute budgets.
The paper, authored by researchers including Amirhossein Abaskohi, Yuhang He, and Peter West, was submitted to arXiv on May 11, 2026. The work addresses a fundamental bottleneck in computer-use agents that has limited their ability to leverage extended interaction histories.
ReVision's efficiency gains could enable more sophisticated agent behaviors by allowing models to consider longer sequences of past actions and observations without hitting token limits.
💬 Discussion
Sign in to join the discussion.
Sign in →No comments yet — be the first.