Home » Is Google’s Reveal of Gemini’s Impact Progress or Greenwashing?

Is Google’s Reveal of Gemini’s Impact Progress or Greenwashing?

According to a technical paper from Google, accompanied by a blog post on their website, the estimated energy consumption of “the median Gemini Apps text prompt” is 0.24 watt-hours (Wh). The water consumption is 0.26 milliliters which is about five drops of water according to the blog post, and the carbon footprint is 0.03 gCO2e. Notably, the estimate does not include image or video prompts.

What’s the magnitude of 0.24 Wh? If you give it 30 median-like prompts per day all year, you will have used 2.62 KWh of electricity. That’s the same as running your dishwasher 3-5 times depending on its energy label.

Google’s disclosure of the environmental impact of their Gemini models has given rise to a fresh round of debate on the environmental impact of AI and how to measure it.

On the surface, these numbers sound reassuringly small, but the more closely you look, the more complicated the story becomes. Let’s dive in. 

Measurement scope

Let’s take a look at what is included and what is omitted in Google’s estimates of the median Gemini text prompt.

Inclusions

The scope of their assessment is “material energy sources under Google’s operational control—i.e. the ability to implement changes to behavior. Specifically, they decompose LLM serving energy consumption as:

  • AI accelerators energy (TPUs – Google’s pendant to the GPU), including networking between accelerators in the same AI computer. These are direct measurements during serving. 
  • Active CPU and DRAM energy – although the AI accelerators aka GPUs or TPUs receive the most attention in the literature, CPU and memory also uses noticeable amounts of energy. 
  • Energy consumption from idle machines waiting to process spike traffic
  • Overhead energy, i.e. the infrastructure supporting data centers—including cooling systems, power conversion, and other overhead within the data center. This is taken into account through the PUE metric – a factor that you multiply measured energy consumption by – and they assume a PUE of 1.09.
  • Google not only measured energy consumption from the LLM that generates the response users see, but also energy from supporting models like scoring, ranking, classification etc.

Omissions

Here is what is not included: 

  • All networking before a prompt hits the AI computer, ie external networking and internal networking that routes queries to the AI computer.
  • End user devices, ie our phones, laptops etc
  • Model training and data storage

Progress or greenwashing?

Above, I outlined the objective facts of the paper. Now, let’s look at different perspectives on the figures. 

Progress

We can hail Google’s publication because:

  • Google’s paper stands out because of the detail behind it. They included CPU and DRAM, which is unfortunately uncommon. Meta, for instance, only measures GPU energy.
  • Google used the median energy consumption rather than the average. The median is not influenced by outliers such as very long or very short prompts and thus arguably tells us what a “typical” prompt consumes. 
  • Something is better than nothing. It is a big step forward from back of the envelope measurements (guilty as charged) and maybe they are paving the way for more detailed studies in the future.
  • Hardware manufacturing costs and end of life costs are included 

Greenwashing

We can criticize Google’s paper because: 

  • It lacks accumulative figures – ideally we would like to know the total impact of their LLM services and what percentage of Google’s total footprint they account for.
  • The authors do not define what the median prompt looks like, e.g. how long is it and how long is the response it elicits
  • They used the median energy consumption than the average. Yes, you read right. This can be viewed as either positive or negative. The median “hides” the effect of high complexity use cases, e.g. very complex reasoning tasks or summaries of very long texts. 
  • Carbon emissions are reported using the market based approach (relying on energy procurement certificates) and not location-based grid data that shows the actual carbon emissions of the energy they used. Had they used the location based approach, the carbon footprint would have been 0.09 gCO2e per median prompt and not 0.03 gCO2e.
  • LLM training costs are not included. The debate about the role of training costs in total costs is ongoing. Does it play a small or big part of the total number? We do not have the full picture (yet). But, we do know that for some models, it takes hundreds of millions of prompts to reach cost parity, which suggests that model training may be a significant factor in the total energy costs.
  • They did not disclose their data, so we cannot double check their results
  • The methodology is not entirely clear. For instance, it is unclear how they arrived at the scope 1 and 3 emissions of 0.010 gCO2e per median prompt. 
  • Google’s water use estimate only considers on-site water consumption, and not total water consumption (i.e. excluding water consumption sources such as electricity generation) which is contrary to standard practice.
  • They exclude emissions from external networking, however, a life cycle assessment of Mistral AI’s Large 2 model shows that network traffic of tokens account for a miniscule part of the total environmental costs of LLM inference (<1 %). So does end user equipment (3 %)

Gemini vs OpenAI ChatGPT vs Mistral

Google’s publication follows disclosures — although of varying degrees of detail — by Mistral AI and OpenAI. 

Sam Altman, CEO at OpenAI, recently wrote in a blog post that: “the average query uses about 0.34 watt-hours, about what an oven would use in a little over one second, or a high-efficiency lightbulb would use in a couple of minutes. It also uses about 0.000085 gallons of water; roughly one fifteenth of a teaspoon.” You can read my in-depth analysis of that claim here.

It is tempting to compare Gemini’s 0.24 Wh per prompt to ChatGPT’s 0.34 Wh, but the numbers are not directly comparable. Gemini’s number is the median, while ChatGPT’s is the average (arithmetic mean, I would venture). Even if they were both medians or means, we could not necessarily conclude that Google is more energy efficient than OpenAI, because we don’t know anything about the prompt that is measured. It could be that OpenAI’s users ask questions that require more reasoning or simply ask longer questions or elicit longer answers. 

According to Mistral AI’s life cycle assessment, a 400-token response from their Large 2 model emits 1.14 gCO₂e and uses 45 mL of water. 

Conclusion

So, is Google’s disclosure greenwashing or genuine progress? I hope I have equipped you to make up your mind about that question. In my view, it is progress, because it widens the scope of what’s measured and gives us data from real infrastructure. But it also falls short because the omissions are as important as the inclusions. Another thing to keep in mind is that these numbers often sound digestible, but they don’t tell us much about systemic impact. Personally, I am however optimistic that we are currently witnessing a wave of AI impact disclosures from big tech, and I would be surprised if Anthropic is not up next. 


That’s it! I hope you enjoyed the story. Let me know what you think!

Follow me for more on AI and sustainability and feel free to follow me on LinkedIn.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *