ChatGPT maker OpenAI has unveiled Operator, an artificial intelligence (AI) agent that can go to the web to perform tasks for users.
AI agents are systems that can perform tasks autonomously or semi-autonomously by processing data, making decisions, and taking actions based on their programming and training.
In a blog post, Microsoft-backed OpenAI says using its own browser, Operator can look at a webpage and interact with it by typing, clicking and scrolling.
The company says Operator is currently a research preview, meaning it has limitations and will evolve based on user feedback.
“Operator is one of our first agents, which are AIs capable of doing work for you independently – you give it a task and it will execute it.”
According to OpenAI, Operator can be asked to handle a wide variety of repetitive browser tasks, such as filling out forms, ordering groceries and even creating memes.
It explains that the ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks, while opening up new engagement opportunities for businesses.
“To ensure a safe and iterative rollout, we are starting small. This research preview allows us to learn from our users and the broader ecosystem, refining and improving as we go.”
Operator is powered by a new model called Computer-Using Agent (CUA). Combining GPT-4os vision capabilities with advanced reasoning through reinforcement learning, OpenAI notes CUA is trained to interact with graphical user interfaces – the buttons, menus and text fields people see on a screen.
“Operator can ‘see’ (through screenshots) and ‘interact’ (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API [application programming interface] integrations,” says the firm.
“If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct. When it gets stuck and needs assistance, it simply hands control back to the user, ensuring a smooth and collaborative experience.”
OpenAI points out that while CUA is still in the early stages and has limitations, it sets new benchmark results in WebArena and WebVoyager, two key browser use benchmarks.
The AI firm says to get started, users describe the task they would like done and Operator can handle the rest.
Users can choose to take over control of the remote browser at any point, and Operator is trained to proactively ask the user to take over for tasks that require login, payment details, or when solving CAPTCHAs.
Users can personalise their workflows in Operator by adding custom instructions, either for all sites or for specific ones, such as setting preferences for airlines on Booking.com. Operator lets users save prompts for quick access on the homepage, ideal for repeated tasks like restocking groceries, says OpenAI.
Similar to using multiple tabs on a browser, users can have Operator run multiple tasks simultaneously by creating new conversations. Users can personalise their workflows in Operator by adding custom instructions, either for all sites or for specific ones.
Last year, US-based start-up Anthropic unveiled its AI agent, which allows developers to direct the agent to use computers the way people do by looking at a screen, moving a cursor, clicking buttons and typing text.
Commenting on the rise of AI agents, John Roese, global CTO and chief AI officer at Dell, says: “In the consumer world, we’ve seen early agent approaches with virtual assistants, chatbots and navigation apps.
“In 2025, a new, more advanced set of agents will emerge. These agents will operate autonomously, communicate in natural language and interact with the world around them, including working in teams of other agents and humans. They will also be fine-tuned and optimised to perform assigned, specific skills, like coding, code review, infrastructure administration, business planning and cyber security.”
Market research firm IDC says the rise of copilots from the GenAI boom of 2022 is quickly giving way to AI agents – fully automated software components that are empowered to use knowledge and skills to assess situations.
Gartner notes that autonomous AI can plan and take action to achieve goals set by the user. It predicts that by 2028, at least 15% of day-to-day work decisions will be made autonomously through agentic AI, up from 0% in 2024.
Share