Google LLC is working on an innovative AI system capable of independently navigating and operating web browsers, with a potential release anticipated in December, according to a recent report by The Information. This system, known internally as “Project Jarvis”, aims to streamline user workflows by automating repetitive online activities, including tasks like shopping, conducting research, and booking travel arrangements.
Project Jarvis is built on Google’s advanced Gemini 2.0 large language model, which is designed to provide substantial improvements in interpreting and generating text that mimics human language. Sources close to The Information have shared that this AI is tailored for use in Google Chrome, where it can recognize screenshots, click buttons, and input text—mimicking user actions to accomplish tasks across different webpages. While the current version reportedly requires “a few seconds” between actions, it remains uncertain if the final rollout will feature the same pauses.
This development follows closely behind Anthropic PBC’s recent introduction of new models, one of which—Claude Sonnet—has the capability to navigate a computer’s interface by moving the mouse, entering text, and clicking buttons. Anthropic’s approach differs in that its model can interact directly with the computer’s system, while Project Jarvis is expected to be limited to web-based interactions within the Chrome browser.
The trend toward developing AI agents that can engage with or interpret on-screen elements is gaining momentum. Microsoft, for instance, recently revealed Copilot Vision, which, although not yet widely available, will analyze images on webpages and respond to user inquiries based on the content. Meanwhile, Apple is also entering this arena with its upcoming Apple Intelligence platform. Unlike Google’s browser-focused approach, Apple is integrating AI into device features such as Siri, which would allow the assistant to respond contextually to on-screen information.
While each company’s approach to AI-driven interactions varies, they share a common goal: creating AI agents capable of handling a range of digital tasks autonomously. This marks a new phase in AI innovation, where systems that interact with and respond to content on screens could soon become standard.
Topics #AI #Artificial intelligence #Automate #Chrome #Chrome Browser #Google #Google Chrome #Jarvis #news #Web Browsing