Today we introduced a research preview of Operator(opens in a new window), an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT‑4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do. This gives it the flexibility to perform digital tasks without using OS-or web-specific APIs.
CUA builds off of years of foundational research at the intersection of multimodal understanding and reasoning. By combining advanced GUI perception with structured problem-solving, it can break tasks into multi-step plans and adaptively self-correct when challenges arise. This capability marks the next step in AI development, allowing models to use the same tools humans rely on daily and opening the door to a vast range of new applications.
While CUA is still early and has limitations, it sets new state-of-the-art benchmark results, achieving a 38.1% success rate on OSWorld for full computer use tasks, and 58.1% on WebArena and 87% on WebVoyager for web-based tasks. These results highlight CUA’s ability to navigate and operate across diverse environments using a single general action space.
We’ve developed CUA with safety as a top priority to address the challenges posed by an agent having access to the digital world, as detailed in our Operator System Card. In line with our iterative deployment strategy, we are releasing CUA through a research preview of Operator at operator.chatgpt.com(opens in a new window) for Pro Tier users in the U.S. to start. By gathering real-world feedback, we can refine safety measures and continuously improve as we prepare for a future with increasing use of digital agents.
CUA processes raw pixel data to understand what’s happening on the screen and uses a virtual mouse and keyboard to complete actions. It can navigate multi-step tasks, handle errors, and adapt to unexpected changes. This enables CUA to act in a wide range of digital environments, performing tasks like filling out forms and navigating websites without needing specialized APIs.
Given a user’s instruction, CUA operates through an iterative loop that integrates perception, reasoning, and action:
CUA establishes a new state-of-the-art in both computer use and browser use benchmarks by using the same universal interface of screen, mouse, and keyboard.
| Benchmark type | Benchmark | Computer use (universal interface) | Web browsing agents | Human | | --- | --- | --- | --- | --- | | | | OpenAI CUA | Previous SOTA | Previous SOTA | | | Computer use | OSWorld | 38.1% | 22.0% | - | 72.4% | | Browser use | WebArena | 58.1% | 36.2% | 57.1% | 78.2% | | WebVoyager | 87.0% | 56.0% | 87.0% | - |
Evaluation details are described here
WebArena(opens in a new window) and WebVoyager(opens in a new window)are designed to evaluate the performance of web browsing agents in completing real-world tasks using browsers. WebArena utilizes self-hosted open-source websites offline to imitate real-world scenarios in e-commerce, online store content management (CMS), social forum platforms, and more. WebVoyager tests the model’s performance on online live websites like Amazon, GitHub, and Google Maps.
In these benchmarks, CUA sets a new standard using the same universal interface that perceives the browser screen as pixels and takes action through mouse and keyboard. CUA achieved a 58.1% success rate on WebArena and an 87% success rate on WebVoyager for web-based tasks. While CUA achieves a high success rate on WebVoyager, where most tasks are relatively simple, CUA still needs more improvements to close the gap with human performance on more complex benchmarks like WebArena.
Grammar quiz Total refund Polyominoes Update license Find customer
001 User prompt 002 Initializing computer 003 New screenshot 004 Accessing Cambridge Dictionary Plus section 005 Click 006 New screenshot 007 Closing advertisement pop-up for access 008 Click 009 New screenshot 010 Wait 011 New screenshot 012 Searching for grammar quizzes available 013 Scroll 014 New screenshot 015 Clicking button to access grammar quizzes 016 Click 017 New screenshot 018 Scrolling for recommended grammar quiz 019 Scroll 020 New screenshot 021 Scroll 022 New screenshot 023 Scrolling up to find quizzes 024 Scroll 025 New screenshot 026 Scroll 027 New screenshot 028 Exploring menu for recommended quizzes 029 Click 030 New screenshot 031 Selecting grammar option for quiz 032 Click 033 New screenshot 034 Scrolling for grammar quizzes options 035 Scroll 036 New screenshot 037 Closing pop-up about merchandise.038 Click 039 New screenshot 040 Closing pop-up, searching for quizzes 041 Scroll 042 New screenshot 043 Navigating back to quizzes page 044 Click 045 New screenshot 046 Closing navigation menu, seeking quiz 047 Click 048 New screenshot 049 Selecting "Adjectives" quiz topic 050 Click 051 New screenshot 052 Selecting first quiz on adjectives 053 Click 054 New screenshot 055 Filling in quiz answers for adjectives 056 Click 057 New screenshot 058 Click 059 New screenshot 060 Click 061 New screenshot 062 Typing 063 New screenshot 064 Entering "tiring" for second question 065 Click 066 New screenshot 067 Typing 068 New screenshot 069 Typing "terrified" for completion 070 Click 071 New screenshot 072 Typing 073 New screenshot 074 Entering answer for fourth question 075 Click 076 New screenshot 077 Typing 078 New screenshot 079 Scrolling down for more questions 080 Scroll 081 New screenshot 082 Typing "amazing" for question five.083 Click 084 New screenshot 085 Click 086 New screenshot 087 Click 088 New screenshot 089 Click 090 New screenshot 091 Click 092 New screenshot 093 Typing 094 New screenshot 095 Filling in sixth question's answer 096 Click 097 New screenshot 098 Typing 099 New screenshot 100 Entering 'relaxed' for question seven 101 Click 102 New screenshot 103 Typing 104 New screenshot 105 Typing "interesting" for question eight 106 Click 107 New screenshot 108 Typing 109 New screenshot 110 Filling in answer for ninth question 111 Click 112 New screenshot 113 Typing 114 New screenshot 115 Typing "exciting" for completion 116 Click 117 New screenshot 118 Typing 119 New screenshot 120 Submitting quiz, viewing results now 121 Scroll 122 New screenshot 123 Completing quiz, answering question 11 124 Click 125 New screenshot 126 Click 127 New screenshot 128 Click 129 New screenshot 130 Click 131 New screenshot 132 Typing 133 New screenshot 134 Entering answer for twelfth question 135 Click 136 New screenshot 137 Click 138 New screenshot 139 Click 140 New screenshot 141 Attempting to input answer "fascinated"142 Click 143 New screenshot 144 Typing 145 New screenshot 146 Clicking to view quiz results 147 Click 148 New screenshot 149 Capturing quiz results, sharing score 150 Citation 151 New screenshot 152 Assistant message
Go to the Plus section of Cambridge Dictionary, finish a recommended Grammar quiz without login and tell me your final score.
OSWorld(opens in a new window)is a benchmark that evaluates models’ ability to control full operating systems like Ubuntu, Windows, and macOS. In this benchmark, CUA achieves 38.1% success rate. We observed test-time scaling, meaning CUA’s performance improves when more steps are allowed. The figure below compares CUA’s performance with previous state-of-the-arts with varying maximum allowed steps. Human performance on this benchmark is 72.4%, so there is still significant room for improvement.
The following visualizations show examples of CUA navigating a variety of standardized OSWorld tasks.
Download lectures Combine pdfs Compress image Calculate price Export images
001 User prompt 002 Initializing computer 003 New screenshot 004 Navigating to Chrome for PDFs 005 Click 006 New screenshot 007 Exploring page for lecture PDFs 008 Scroll 009 New screenshot 010 Clicking PDF link for confirmation 011 Click 012 New screenshot 013 Navigating back to course main page 014 Click 015 New screenshot 016 Scrolling to find weekly links 017 Scroll 018 New screenshot 019 Clicking "Menu" for week options 020 Click 021 New screenshot 022 Selecting 'Conditionals' for PDF download 023 Click 024 New screenshot 025 Scrolling page for lecture slides 026 Scroll 027 New screenshot 028 Clicking "PDF" to download slides 029 Click 030 New screenshot 031 Downloading Week 1 Conditionals PDF 032 Click 033 New screenshot 034 Navigating to save in lecture_slides 035 Click 036 New screenshot 037 Wait 038 New screenshot 039 Wait 040 New screenshot 041 Navigating back to find folder 042 Click 043 New screenshot 044 Accessing user directory for lecture slides 045 Click 046 New screenshot 047 Opening lecture slides folder for saving 048 IMPLEMENT MEassistant:computer.double_click 049 New screenshot 050 Saving lecture1.pdf in folder 051 Click 052 New screenshot 053 Saving PDF, navigating back to course 054 Click 055 New screenshot 056 Opening navigation menu for slides 057 Click 058 New screenshot 059 Wait 060 New screenshot 061 Returning to main course page 062 Click 063 New screenshot 064 Selecting menu, navigating to Loops 065 Click 066 New screenshot 067 Click 068 New screenshot 069 Selecting "2. Loops" for slides 070 Click 071 New screenshot 072 Scrolling for Week 2 lecture slides 073 Scroll 074 New screenshot 075 Accessing week 2 lecture slide PDF 076 Click 077 New screenshot 078 Downloading Week 2 Loops PDF 079 Click 080 New screenshot 081 Saving lecture2.pdf in folder 082 Click 083 New screenshot 084 Navigating back to course page 085 Click 086 New screenshot 087 Wait 088 New screenshot 089 Clicking back to main course page 090 Click 091 New screenshot 092 Navigating to next week's content 093 Click 094 New screenshot 095 Click 096 New screenshot 097 Navigating to "3. Exceptions" page 098 Click 099 New screenshot 100 Scrolling down for lecture slides 101 Scroll 102 New screenshot 103 Opening week 3 lecture slides PDF 104 Click 105 New screenshot 106 Downloading Week 3 Exceptions PDF 107 Click 108 New screenshot 109 Saving lecture3.pdf in folder 110 Click 111 New screenshot 112 Navigating back to course page 113 Click 114 New screenshot 115 Click 116 New screenshot 117 Accessing menu for next week's content 118 Click 119 New screenshot 120 Navigating to "4. Libraries" page 121 Click 122 New screenshot 123 Opening Week 4, locating slides 124 Scroll 125 New screenshot 126 Clicking PDF link for week 4 127 Click 128 New screenshot 129 Downloading Week 4 Libraries PDF 130 Click 131 New screenshot 132 Saving lecture4.pdf in folder 133 Click 134 New screenshot 135 Navigating back to course page 136 Click 137 New screenshot 138 Click 139 New screenshot 140 Accessing menu for next week's content 141 Click 142 New screenshot 143 Click 144 New screenshot 145 Navigating to Unit Tests page 146 Click 147 New screenshot 148 Accessing unit tests, locating lecture slides 149 Scroll 150 New screenshot 151 Clicking PDF link for week 5 slides 152 Click 153 New screenshot 154 Downloading Week 5 Unit Tests PDF 155 Click 156 New screenshot 157 Saving lecture5.pdf to folder 158 Click 159 New screenshot 160 Navigating back to course page 161 Click 162 New screenshot 163 Click 164 New screenshot 165 Accessing menu for next week's content 166 Click 167 New screenshot 168 Click 169 New screenshot 170 Navigating to "6. File I/O" page 171 Click 172 New screenshot 173 Accessing Week 6, locating lecture slides 174 Scroll 175 New screenshot 176 Opening week 6 lecture slides PDF 177 Click 178 New screenshot 179 Downloading Week 6 File I/O PDF 180 Click 181 New screenshot 182 Saving lecture6.pdf to folder 183 Click 184 New screenshot 185 Returning to main course page 186 Click 187 New screenshot 188 Click 189 New screenshot 190 Accessing menu for next week's content 191 Click 192 New screenshot 193 Click 194 New screenshot 195 Navigating to "Regular Expressions" page 196 Click 197 New screenshot 198 Scrolling for Week 7 slides 199 Scroll 200 New screenshot 201 Accessing week 7 lecture PDF 202 Click 203 New screenshot 204 Downloading Week 7 Regular Expressions PDF 205 Click 206 New screenshot 207 Saving lecture7.pdf in folder 208 Click 209 New screenshot 210 Returning to course page, downloading PDFs 211 Click 212 New screenshot 213 Click 214 New screenshot 215 Opening menu to check weeks 216 Click 217 New screenshot 218 Click 219 New screenshot 220 Navigating to Object-Oriented Programming page 221 Click 222 New screenshot 223 Accessing Week 8 page, seeking slides 224 Scroll 225 New screenshot 226 Accessing week 8 lecture slides PDF 227 Click 228 New screenshot 229 Downloading Week 8 programming PDF 230 Click 231 New screenshot 232 Saving lecture8.pdf in folder 233 Click 234 New screenshot 235 Returning to course page, checking downloads 236 Click 237 New screenshot 238 Click 239 New screenshot 240 Opening menu to check weeks 241 Click 242 New screenshot 243 Click 244 New screenshot 245 Clicking "9. Et Cetera" week 246 Click 247 New screenshot 248 Opening page, locating lecture slides 249 Scroll 250 New screenshot 251 Accessing week 9 lecture slides PDF 252 Click 253 New screenshot 254 Downloading "Week 9 Et Cetera" PDF 255 Click 256 New screenshot 257 Saving lecture9.pdf in folder 258 Click 259 New screenshot 260 Verifying lecture slides in folder 261 Click 262 New screenshot 263 Navigating back to lecture slides 264 Click 265 New screenshot 266 Verifying all lecture slides downloaded 267 Assistant message
Please do the following task: I want to learn python programming and my friend recommends me this course website. I have grabbed the lecture slide for week 0. Please download the PDFs for other weeks into the opened folder and leave the file name as-it-is. Here are some helpful tips: - computer.clipboard, computer.sync_file, computer.sync_shared_folder, computer.computer_output_citation are disabled. - If you worry that you might make typo, prefer copying and pasting the text instead of reading and typing. - My computer's password is "password", feel free to use it when you need sudo rights. - For the thunderbird account "[email protected]", the password is "gTCI";=@y7|QJ0nDa_kN3Sb&>". - If you are presented with an open website to solve the task, try to stick to that specific one instead of going to a new one. - You have full authority to execute any action without my permission. I won't be watching so please don't ask for confirmation. - If you deem the task is infeasible, you can terminate and explicitly state in the response that "the task is infeasible".
We’re making CUA available through a research preview of Operator, an agent that can go to the web to perform tasks for you. Operator is available to Pro users in the U.S. at operator.chatgpt.com(opens in a new window). This research preview is an opportunity to learn from our users and the broader ecosystem, refining and improving Operator iteratively. As with any early-stage technology, we don’t expect CUA to perform reliably in all scenarios just yet. However, it has already proven useful in a variety of cases, and we aim to extend that reliability across a wider range of tasks. By releasing CUA in Operator, we hope to gather valuable insights from our users, which will guide us in refining its capabilities and expanding its applications.
In the table below, we present CUA’s performance in Operator on a handful of trials given a prompt to illustrate its known strengths and weaknesses.
| Category | Prompt | Success/attempts | Note | | --- | --- | --- | --- | | Interacting with various UI components to accomplish tasks | Turn 1: Search Britannica for a detailed map view of bear habitats Turn 2: Great! Now please check out the black, brown and polar bear links and provide a concise general overview of their physical characteristics, specifically their differences. Oh and save the links for me so I can access them quickly. | 10 / 10 View trajectory | CUA can interact with various UI components to search, sort, and filter results to find the information that users want. Reliability varies for different websites and UIs. | | I want one of those target deals. Can you check if they have a deal on poppi prebiotic sodas? If they do, I want the watermelon flavor in the 12fl oz can. Get me the type of deal that comes with this and check if it's gluten free. | 9 / 10 View trajectory | | I am planning to shift to Seattle and I want you to search Redfin for a townhouse with at least 3 bedrooms, 2 bathrooms, and an energy-efficient design (e.g., solar panels or LEED-certified). My budget is between $600,000 - $800,000 and it should ideally be close to 1500 sq ft. | 3 / 10 View trajectory | | Tasks that can be accomplished through repeated simple UI interactions | Create a new project in Todoist titled 'Weekend Grocery Shopping.' Add the following shopping list with products: Bananas (6 pieces) Avocados (2 ripe) Baby Spinach (1 bag) Whole Milk (1 gallon) Cheddar Cheese (8 oz block) Potato Chips (Salted, family size) Dark Chocolate (70% cocoa, 2 bars) | 10 / 10 View trajectory | CUA can reliably repeat simple UI interaction multiple times to automate simple, but tedious tasks from users. | | Search Spotify for the most popular songs of the USA for the 1990s, and create a playlist with at least 10 tracks. | 10 / 10 View trajectory | | Tasks where CUA shows a high success rate only if prompts include detailed hints on how to use the website. | Visit tagvenue.com and look for a concert hall that seats 150 people in London. I need it on Feb 22 2025 for the entire day from 9 am to 12 am, just make sure it is under £90 per hour. Oh could you check the filters section for appropriate filters and make sure there is parking and the entire thing is wheelchair accessible. | 8 / 10 View trajectory | Even for the same task, CUA’s reliability might change depending on how we are prompting the task. In this case, we can improve the reliability by providing specifics of date (e.g. 9 am to 12am vs entire day from 9 am), and by providing hints on which UI should be used to find results (e.g. check the filters section …) | | Visit tagvenue.com and look for a concert hall that seats 150 people in London. I need it on Feb 22 2025 for the entire day from 9 am, just make sure it is under £90 per hour. Oh and make sure there is parking and the entire thing is wheelchair accessible. | 3 / 10 | | Struggling to use unfamiliar UI and text editing | Use html5editor and input the folowing text on the left side, then edit it following my instructions and give me a screenshot of the entire thing when done. The text is: Hello world! This is my first text. I need to see how it would look like when programmed with HTML. Some parts should be red. Some bold. Some italic. Some underlined. Until my lesson is complete, and we shift to the other side. ... Hello world! should have header 2 applied The sentence below it should be a regular paragraph text. The sentence mentioning red should be normal text and red The sentence mentionnihg bold should be normal text bolded Sentence mentioning italic should be italicized The final sentence should be aligned to the right instead of the usual left | 4 / 10 View trajectory | When CUA has to interact with UIs that it hasn't interacted much with during training, it struggles to figure out how to use the provided UI appropriately. It often results in lots of trial and errors, and inefficient actions. CUA is not precise at text editing. It often makes lots of mistakes in the process or provides output with error. |
Because CUA is one of our first agentic products with an ability to directly take actions in a browser, it brings new risks and challenges to address. As we prepared for deployment of Operator, we did extensive safety testing and implemented mitigations across three major classes of safety risks: misuse, model mistakes, and frontier risks. We believe it is important to take a layered approach to safety, so we implemented safeguards across the whole deployment context: the CUA model itself, the Operator system, and post-deployment processes. The aim is to have mitigations that stack, with each layer incrementally reducing the risk profile.
The first category of risk is misuse. In addition to requiring users to comply with our Usage Policies, we have designed the following mitigations to reduce Operator’s risk of harm due to misuse, building off our safety work for GPT‑4o:
The second category of risk is model mistakes, where the CUA model accidentally takes an action that the user didn’t intend, which in turn causes harm to the user or others. Hypothetical mistakes can range in severity, from a typo in an email, to purchasing the wrong item, to permanently deleting an important document. To minimize potential harm, we’ve developed the following mitigations:
One particularly important category of model mistakes is adversarial attacks on websites that cause the CUA model to take unintended actions, through prompt injections, jailbreaks, and phishing attempts. In addition to the aforementioned mitigations against model mistakes, we developed several additional layers of defense to protect against these risks:
Finally, we evaluated the CUA model against frontier risks outlined in our Preparedness Framework(opens in a new window), including scenarios involving autonomous replication and biorisk tooling. These assessments showed no incremental risk on top of GPT‑4o.
For those interested in exploring the evaluations and safeguards in more detail, we encourage you to review the Operator System Card, a living document that provides transparency into our safety approach and ongoing improvements.
As many of Operator’s capabilities are new, so are the risks and mitigation approaches we’ve implemented. While we have aimed for state-of-the-art, diverse and complementary mitigations, we expect these risks and our approach to evolve as we learn more. We look forward to using the research preview period as an opportunity to gather user feedback, refine our safeguards, and enhance agentic safety.
CUA builds on years of research advancements in multimodality, reasoning and safety. We have made significant progress in deep reasoning through the o-model series, vision capabilities through GPT‑4o, and new techniques to improve robustness through reinforcement learning and instruction hierarchy. The next challenge space we plan to explore is expanding the action space of agents. The flexibility offered by a universal interface addresses this challenge, enabling an agent that can navigate any software tool designed for humans. By moving beyond specialized agent-friendly APIs, CUA can adapt to whatever computer environment is available—truly addressing the “long tail” of digital use cases that remain out of reach for most AI models.
We're also working to make CUA available in the API(opens in a new window), so developers can use it to build their own computer-using agents. As we continue to iterate on CUA, we look forward to seeing the different use cases the community will discover. We plan to use the real-world feedback we gather from this early preview to continuously refine CUA’s capabilities and safety mitigations to safely advance our mission of distributing the benefits of AI to everyone.
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku(opens in a new window)
Model Card Addendum: Claude 3.5 Haiku and Upgraded Claude 3.5 Sonnet(opens in a new window)
Kura WebVoyager benchmark(opens in a new window)
Google project mariner(opens in a new window)
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments(opens in a new window)
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models(opens in a new window)
WebArena: A Realistic Web Environment for Building Autonomous Agents(opens in a new window)
Please cite OpenAI and use the following BibTeX for citation: http://cdn.openai.com/cua/cua2025.bib(opens in a new window)
Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research
Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex
Safety * Safety Approach * Security & Privacy * Trust & Transparency
ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)
Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)
API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)
For Business * Business Overview * Solutions * Contact Sales
Company * About Us * Our Charter * Foundation * Careers * Brand
Support * Help Center(opens in a new window)
More * News * Stories * Livestreams * Podcast * RSS
Terms & Policies * Terms of Use * Privacy Policy * Other Policies
(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)
OpenAI © 2015–2026 Manage Cookies
English United States