← Back Published on 21st May 2025

Project Astra and Agent Mode Are AI with Eyes and Hands

One of the most sci-fi-esque demos at I/O was Google’s progress with AI that can see and act on our behalf. They’ve been working on this in research (you may recall Project Astra from last year), and now those advances are hitting real products in two ways: Gemini Live (integrating Astra’s vision) and Agent Mode (integrating the tool-using Project Mariner). For me, this was the “Jarvis moment” – glimpses of an assistant that chats with you, but can also perceive your world and do things for you.

Project Astra → Gemini Live

Project Astra was Google's "concept car" of a universal AI assistant – a multimodal bot that can understand live visuals and audio, not just text. At I/O 2025, Sundar Pichai announced that Astra's capabilities have been folded into the Gemini Live feature of the Gemini app. In practical terms, this means you can now point your phone's camera at something or share your screen, and the AI will analyze it and help you in real time. This is already live for all Android users (with iOS rolling out starting today) via the Gemini app.

What can it do? The demos showed scenarios like holding up your phone to a pair of running shoes you're considering – and the AI, seeing through your camera, can overlay information or answer questions about the item. People are using it for things like interview preparation and marathon training, where the AI "observes" (via camera) and offers guidance. It can read text it sees, identify objects, or watch you perform an exercise and give tips – wild! Think of it as Google Lens on steroids, fused with a coach. One Googler described Astra (now Gemini Live) as an AI that can "see your world and provide hands-free help," moving us closer to that ideal assistant from science fiction.

As someone who's fiddled with plenty of AR and no-code tools, I find this fascinating. It lowers the barrier for contextual AI assistance. Instead of describing a problem to the AI, you just show it. Content-wise, this hints at a future where visual content and real-time data become part of SEO/optimization. (Will businesses need to optimize how their products are recognized by AI via camera? Possibly yes – maybe through machine-readable labels or AR cues. It's a new frontier.)

Project Mariner → Agent Mode

The second half of this is giving the AI the ability to act online for you. Google's Project Mariner was an experiment in "agentic AI" – basically an AI that can use tools, click buttons, fill forms, etc., under your command. At I/O, they announced those capabilities are coming to products as Agent Mode.

For developers, Google is opening up these agent APIs so that third-party apps can let AI agents interact with them (early partners include Automation Anywhere and UiPath for workflows). For consumers, the Gemini app will soon get an Agent Mode that can do multi-step tasks. The example given: "Help me find and book an apartment." The AI will search listings (e.g., on Zillow), refine the results with your criteria, use integrated tools or forms to contact landlords, and even schedule tours for you. All while you monitor or give it the go-ahead. It's like having a very proactive assistant who clicks through websites so you don't have to.

Crucially, Google stressed that these agents operate under your oversight. They're adopting standards like Model Context Protocol (MCP) – an open framework proposed by Anthropic – so different AI agents and tools can work together safely. That's a bit technical, but it means Google is thinking about interoperability and security (no rogue AI runs off with your credit card… hopefully).

From a content strategy viewpoint, Agent Mode could be a game-changer for conversion-oriented tasks. Imagine a user saying, "Find me a budget gaming laptop and buy it for me when it hits $800." The AI could compare products, perhaps consult reviews or specs (content we or others wrote), then handle the checkout when conditions are met. Google actually previewed something similar for shopping: an AI shopping agent that will buy an item when its price drops to your target.

For businesses, this means the AI might become the customer! We'll need to provide data in a way the AI can use (think robust structured data, APIs for inventory, etc.). It also means competition to be the choice the AI agent picks – a new kind of SEO/marketing challenge.

Project Astra and Agent Mode show Google's AI moving from passive answerer to active doer. It can see, it can click, it can transact. For users, this promises huge convenience – less fiddling with apps or forms. For us marketers and product folks, it's time to ensure our systems can talk to AI agents.

If you've been wondering,"Can the agent do my taxes next year?", we might not be far from that! It's both exciting and a bit scary – we're essentially on the path to personal AI concierges. This is one area I'll be experimenting with heavily, to see how content influences the agent's decisions and how we can earn that "agent recommendation".

Next - AI with Eyes and Hands

Project Astra and Agent Mode Are AI with Eyes and Hands

Project Astra → Gemini Live

Project Mariner → Agent Mode

Post a comment

Abhinav Malhotra