A prompt-injection vulnerability in Google Gemini for Workspace was disclosed, enabling the generation of seemingly legitimate email summaries that can direct users to phishing sites via hidden instructions. This method circumvents traditional detection by avoiding attachments or direct links.
This attack vector utilizes indirect prompt injections embedded within an email, which Gemini’s summary generation process then obeys. Despite similar prompt injection attacks being reported since 2024 and Google’s implementation of safeguards designed to block misleading responses, this specific technique has demonstrated continued success. The vulnerability was publicly revealed through 0din, Mozilla’s bug bounty program dedicated to generative AI tools. Marco Figueroa, GenAI Bug Bounty Programs Manager at Mozilla, was responsible for the disclosure.
The attack mechanism involves crafting an email that contains an invisible directive specifically intended for Gemini. An attacker can conceal this malicious instruction within the email’s body text by applying HTML and CSS styling that sets the font size to zero and the font color to white. This renders the instruction imperceptible to the human eye when the email is viewed in Gmail. Crucially, because the email contains neither attachments nor direct links, it is highly probable that such a message will successfully bypass email security filters and reach the intended recipient’s inbox without being flagged.
Google simplifies Lens to make room for its Gemini AI
Should a recipient open this email and subsequently use Google Gemini to generate a summary of its content, Google’s AI tool will process the hidden, invisible directive. Consequently, Gemini will then obey this concealed instruction as part of its summary generation. Figueroa provided an example demonstrating this exploit: Gemini followed the embedded instruction and produced a security warning for the user, falsely stating that their Gmail password had been compromised, and included a fabricated support phone number. Given that many users are likely to place trust in Gemini’s output as an integral function of Google Workspace, there is a high probability that this generated alert would be perceived as a legitimate security warning rather than a malicious injection, potentially leading users to contact the fraudulent number.
In response to this vulnerability, Figueroa has outlined several detection and mitigation strategies that security teams can implement. One recommended approach involves developing systems to remove, neutralize, or entirely disregard content within the email body that is styled to be hidden. An alternative method proposed is to employ a post-processing filter that actively scans Gemini’s generated output for specific indicators, such as urgent messages, unrecognized URLs, or suspicious phone numbers. Such a filter would then flag the summary for further review, preventing potentially malicious instructions from reaching the user unchallenged. Additionally, users are advised to exercise caution and should not consider Gemini summaries as authoritative sources for security alerts.
BleepingComputer contacted Google for information regarding defenses against such attacks. A Google spokesperson directed BleepingComputer to a Google blog post detailing security measures against prompt injection attacks. The spokesperson stated, “We are constantly hardening our already robust defenses through red-teaming exercises that train our models to defend against these types of adversarial attacks.” The company representative further clarified that some of these mitigations are currently in the process of being implemented or are scheduled for deployment soon. Google has reported no evidence of incidents manipulating Gemini in the manner demonstrated in Figueroa’s report.