Google’s Gemini Live, initially revealed at last year’s Made by Google event, is receiving significant upgrades. These enhancements include visual overlays during camera feed sharing and a new audio model designed for more natural conversations. The upgrades aim to make Gemini Live a more helpful and responsive digital assistant.
Since its introduction, Gemini Live has seen several improvements, notably the ability to share camera feeds and screens. Google has now announced an enhancement to its camera-sharing capabilities and a new native audio model to further enhance the naturalness of interactions with the AI chatbot.
During the presentation on the forthcoming Google Pixel 10 series, Google provided details regarding upcoming improvements to Gemini Live on Android. A key feature is the addition of visual overlays that highlight specific objects within the camera feed. These visual cues take the form of white-bordered rectangles around the objects of interest, with the surrounding area slightly dimmed to ensure prominence.
The “visual guidance” feature is intended to assist users in quickly locating and identifying items within the camera’s field of view. Examples of intended uses include highlighting the correct button on a machine, identifying a specific bird within a flock, or pinpointing the right tool for a particular project. The feature also extends to providing advice, such as recommending appropriate footwear for a specific occasion.
The visual guidance capability can also manage more challenging scenarios. A Google product manager recounted a personal experience during an international trip where they encountered difficulty interpreting foreign-language parking signs, road markings, and local regulations. Using Gemini Live, the product manager pointed the camera at the scene and inquired about parking permissibility. Gemini Live then consulted local rules, translated the signs, and highlighted an area on the street offering free parking for two hours.
Visual guidance will be available directly on the Google Pixel 10 series and will begin its rollout to other Android devices the following week. Expansion to iOS devices is planned in the subsequent weeks. A Google AI Pro or Ultra subscription will not be necessary to access the visual guidance feature.
Alongside the visual overlays, Google is implementing a new native audio model within Gemini Live. This model is designed to facilitate more responsive and expressive conversations.
The new audio model will respond more appropriately based on the context of the conversation. For instance, when discussing a stressful topic, the audio model will respond using a calmer and more measured tone.
Users will have control over the audio model’s speech characteristics. If a user finds it difficult to keep up with Gemini’s speech, they can request it to speak more slowly. Conversely, when time is limited, users can instruct Gemini to accelerate its speech.
The system can also deliver narratives from specific perspectives. As Google stated in its blog post, users can “Ask Gemini to tell you about the Roman empire from the perspective of Julius Caesar himself, and get a rich, engaging narrative complete with character accents.”
This article was updated at 7:50 PM ET to provide clarifications regarding the natural audio model and incorporate demo assets from Google’s blog post.