VICTORIS
Budget Express 2026

co-presented by

  • LIC
  • JIO BlackRock

ASSOCIATE SPONSORS

  • Sunteck
  • SBI
  • Emirates
  • Dezerv
Loans
Loans
HomeTechnologyGemini 3 Flash gets Agentic Vision to deliver more accurate, evidence-based image understanding

Gemini 3 Flash gets Agentic Vision to deliver more accurate, evidence-based image understanding

Google has introduced Agentic Vision for Gemini 3 Flash, a new capability that improves how the model understands and responds to image-based prompts.

January 29, 2026 / 18:43 IST
Google Gemini
Snapshot AI
  • Google launches Agentic Vision for Gemini 3 Flash to enhance image response accuracy
  • Agentic Vision uses a Think, Act, Observe loop for active visual analysis
  • Feature now available via Gemini app and API, with future expansion planned

Google has unveiled a new capability called Agentic Vision for Gemini 3 Flash, aimed at making image-based responses more accurate and reliable. The update addresses a long-standing limitation of frontier AI models, which often process images in a single pass and are forced to guess when they miss fine-grained details such as serial numbers, small text, or distant objects.

With Agentic Vision, Gemini 3 Flash moves away from passive image interpretation and instead treats visual understanding as an active, investigative process. The model is designed to ground its answers in visual evidence by combining visual reasoning with code execution and, over time, additional tools. This approach allows the AI to inspect, manipulate, and reason about images in a structured way before producing a final response.

At the core of Agentic Vision is what Google describes as a Think, Act, Observe loop. In the Think phase, the model analyses the user’s prompt along with the initial image and formulates a multi-step plan. Rather than jumping straight to an answer, Gemini 3 Flash decides what actions are needed to extract the required information. In the Act phase, the model generates and executes Python code to manipulate or analyse the image. This can include cropping specific regions, zooming into small details, rotating the image, drawing annotations, or running calculations such as counting objects or measuring distances. In the Observe phase, the transformed image is added back into the model’s context, allowing it to reassess the new visual information with greater clarity before responding.

This process enables Gemini 3 Flash to go beyond simply describing what it sees. For example, when asked to count the digits on a hand, the model does not rely on a rough visual estimate. Instead, it uses Python code to draw bounding boxes and numeric labels over each finger it detects. This annotated image acts as a visual scratchpad, ensuring that the final answer is based on precise, pixel-level understanding rather than probabilistic guessing.

The same approach applies to other challenging visual tasks. When Gemini 3 Flash encounters fine-grained details, Agentic Vision can automatically zoom into relevant areas of an image. It can also parse dense tables embedded in images and execute Python code to extract and visualise the data. This is particularly important for tasks involving multi-step visual arithmetic, where traditional large language models are prone to hallucination. By offloading calculations to a deterministic Python environment, Gemini 3 Flash replaces uncertainty with verifiable execution.

Google says this capability is beginning to roll out in the Gemini app using the Thinking model. Developers can already access Agentic Vision through the Gemini API in Google AI Studio and Vertex AI, making it available for production use in applications that require reliable visual analysis.

Looking ahead, Google plans to expand Agentic Vision further. Future updates are expected to improve the model’s ability to rotate images or perform visual mathematics without requiring explicit prompts to trigger these actions. Over time, Gemini 3 Flash will also gain access to additional tools, including web search and reverse image search, to ground its understanding of the world even more deeply. Google has indicated that Agentic Vision will not be limited to Gemini 3 Flash and will eventually be available across other Gemini models as well.

Taken together, Agentic Vision represents a meaningful shift in how AI systems interpret images. By turning vision into an active, tool-driven process, Gemini 3 Flash moves closer to delivering image responses that users can trust, especially in scenarios where accuracy and detail matter most.

 

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Ayush Mukherjee
first published: Jan 29, 2026 06:43 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347