Home » Google’s URL Context Grounding: Another Nail in RAG’s Coffin?

Google’s URL Context Grounding: Another Nail in RAG’s Coffin?

in AI-related releases continues unabated. Just a few days ago, it released a new tool for Gemini called URL context grounding. 

URL context grounding can be used stand-alone or combined with Google search grounding to conduct deep dives into internet content.

What is URL context grounding?

In a nutshell, it’s a way to programmatically have Gemini read, understand and answer questions about content and data contained in individual web URLs (including those pointing to PDFs) without the need to perform what we know as traditional RAG processing.

In other words, there is no need to extract the URL text and content, chunk it, vectorise it, store it and so on. You tell Google what URL you’re interested in and off you go. As you’ll see in a moment, it is very straightforward to code and highly accurate.

It’s for those reasons that I said it could be another nail in RAG’s coffin.

But does it work? Let’s look at a couple of examples. 

I’ll set up my development environment first under Ubuntu WSL2 for Windows. Follow along or use whichever method you’re used to.

$ uv init url_context
$ cd url_context
$ uv venv url_context
$ uv pip install jupyter
$ uv pip install "google-genai>=1.16.0"

You’ll also need a Google API key. If you don’t already have one, head over to Google AI Studio, sign up if you need to, and set your key up. The link to do so will be near the top right-hand side corner of the dashboard page.

Google AI Studio

Now running this command should bring up a new TAB in your browser with a Notebook.

$ jupyter notebook

Some limitations to be aware of

Before proceeding to our coding examples, there are a few limitations and restrictions on the use of URL context grounding you should be aware of.

  1. A maximum of 20 URLs can be included per request.
  2. The maximum size for content retrieved from a single URL is 34MB.
  3. The following content types are not supported
  • Paywalled content
  • YouTube videos
  • Google Workspace files, like Google Docs or spreadsheets
  • Video and audio files

With that being said, let’s get on with our examples.

Example 1 — Interrogating a complex PDF

My go-to test data file when I’m testing RAG or similar processing against data in PDFs is to use one of Tesla’s 10-Q quarterly earnings report. It’s pretty long at around 50 pages and has some quite complex layouts with tables etc.

As it’s an SEC filing document, it also means that it’s publicly available and completely free to use its contents.

If you want to have a look for yourself, the document can be found at this URL.

https://ir.tesla.com/_flysystem/s3/sec/000162828023034847/tsla-20230930-gen.pdf

For this PDF, the question I always pose is this,

"What are the Total liabilities and Total assets for 2022 and 2023"

The answer to that question is on page 4 of the document. Here is that page.

Image from Tesla SEC 10-Q filing document

To humans, the answer is easy to find. As you can see, the Total assets for 2022/2023 were (in Millions) $82,338/$93,941. The Total liabilities were (in Millions) $36,440/$39,446.

Back in the day (i.e about 18 months ago!), it was challenging to get this information from this document using traditional RAG methods.

How will Google URL context grounding cope?

In your Jupyter notebook, type in this code.

from google import genai
from google.genai import types

from IPython.display import HTML, Markdown

client = genai.Client(api_key='YOUR_API_KEY HERE')

# We can use most of the Gemini models such as 2.5 Flash etc... here 
MODEL_ID = "gemini-2.5-pro"

prompt = """
  Based on the contents of this PDF https://ir.tesla.com/_flysystem/s3/sec/000162828023034847/tsla-20230930-gen.pdf, What 
  are the Total liabilities and Total assets for 2022 and 2023. Lay them out in this format
                   September 30 2023    December 31, 2022
Total Assets         $123               $456
Total Liabilities    $67                $23

Don't output anything else, just the above information
"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(response.text)

That’s it, just a handful of lines, but let’s see the output.

'September 30 2023 December 31, 2022nTotal Assets $93,941 $82,338nTotal Liabilities $39,446 $36,440'

Spot on, not too shabby. 

Let’s see if it can pick out some other information. Near the end of the PDF, there is a letter to an employee who is about to leave the company outlining their terms of severance. Can URL context grounding determine why the exit date referred to in the letter is marked by asterisks (***)? Here’s a snippet of the letter.

Image from Tesla SEC 10-Q filing document

The reason for the masking out of the exit date is given in a footnote.

Image from Tesla SEC 10-Q filing document

The code we need to extract this info is very similar to our first example. In fact, the only thing that changes is the prompt, so I’ll only show that.

...
...
prompt = """
  Based on https://ir.tesla.com/_flysystem/s3/sec/000162828023034847/tsla-20230930-gen.pdf, an employee severance letter is displayed
  Why is the exit date referred to in the letter marked with ***
"""
...
...

And the output?

'Based on the provided document, the exit date in the employee severance 
letter is marked with "[***]" because specific, non-material information 
that the company treats as private or confidential has been intentionally 
omitted from the public filing.nnThe document includes a note clarifying 
this practice: "Certain identified information has been omitted from this 
document because it is not material and is the type that the company treats 
as private or confidential, and has been marked with "[***]" to indicate 
where omissions have been made."'

As you can see, that is spot on once again.

What are other uses for URL context grounding?

In my opinion, it opens up a wealth of new opportunities, including:-

In-depth Content Analysis and Synthesis.

  • Data Extraction. The tool can pull specific information, such as prices, names, or key findings, from multiple URLs.
  • Document Comparison. It can analyse multiple reports, articles, or even PDFs to identify differences and track trends.
  • Content Creation. By combining information from several source URLs, the AI can generate accurate summaries, blog posts, or reports. For example, a developer could use the tool to compare two recipes from different websites, analysing ingredients and cooking times.
  • Code and Documentation Analysis. Developers can point the AI to a GitHub repository or technical documentation to explain code, generate setup instructions, or answer specific questions about it.

Sophisticated Agentic Workflows.

  • The combination of broad discovery through Google Search and deep analysis via the URL context tool forms the basis for complex, multi-step tasks. An AI agent could first search for relevant articles on a topic and then use the URL context tool to deeply “read” and synthesise information from the most pertinent search results.
  • The Gemini CLI, an open-source AI agent, utilises the URL context tool for its web-fetch command. This allows developers to quickly summarise webpages, extract key information, or even translate content directly from their terminal.

Improved Factual Accuracy and Reduced Hallucinations.

  • By grounding responses in the content of specific web pages, the AI’s factual accuracy is increased, reducing the likelihood of generating incorrect or fabricated information. This also allows the AI to provide citations for its claims, building user trust by showing the sources of its information.

Supports a wide variety of content types.

  • PDFs. The AI can extract text and understand the structure of tables within PDF documents, making reports and manuals accessible for grounding.
  • Images. It can process and analyse images in various formats (PNG, JPEG, BMP, WebP), leveraging multimodal capabilities to understand charts and diagrams.
  • Web and Data Files. Continued support for HTML, JSON, XML, CSV, and plain text files ensures broad applicability.

Example 2 — Perform a price comparison 

For our second example, let’s assume we’re on the hunt for a new set of headphones. We will feed a list of the URLs of several online shops selling the product into our code and ask the model to retrieve the three cheapest products that meet our specification. 

This example may feel a bit redundant since there are plenty of shopping comparison websites out there, but it’s really just meant to highlight the kinds of things you can do with the tool.

Say we want to buy a specific model of headphones, e.g. the Sony WH-1000XM5 Wireless Noise-Cancelling Headphones. We have identified online shops with the most competitive prices, but these prices fluctuate almost daily. Let’s create a script that can run at any time to return the stores with the three cheapest prices.

Again, the only difference between this example code and our first is the prompt. The rest of the code is the same.

prompt = """
  Based on these URL links, output the three cheapest prices for these 
  headphones and the relevant store.
  
  
  https://electronics.sony.com/audio/headphones/headband/p/wh1000xm5-b?srsltid=AfmBOopJmjebTtZEieUvHEf5xEke7C7piVi3BdlSUdTPJH3wuBfTksJy
  https://tristatecamera.com/product/TRI_STATE_CAMERA_Sony_WH-1000XM5_Wireless_Noise-Canceling_Over-Ear_Headphones_Black_1_Yr_WH1000XM5BS2.html?refid=279&KPID=SONWH1000XM5BS2&fl=GSOrganic&srsltid=AfmBOoqnE7vgc1uOELadhkaRlhHuJx3HGRTV5ICN7ihNkFXI_UEuImZ2gXU
  https://poshmark.com/listing/Sony-WH-1000xm5-Headphones-672d0ab515ad54b37949b845#utm_source=gdm_unpaid
  https://reverb.com/item/91492218-sony-wh-1000xm5-wireless-noise-canceling-over-the-ear-headphones-silver?utm_campaign=US-Shop_unpaid&utm_medium=cpc&utm_source=google
  Sony WH-1000XM5 Noise-Canceling Wireless Over-Ear Headphones (Black)
  https://www.newegg.com/p/0TH-000U-00JZ4?item=9SIA29PK9N4805&utm_source=google&utm_medium=organic+shopping&utm_campaign=knc-googleadwords-_-headphones+and+accessories-_-sony-_-9SIA29PK9N4805&source=region&srsltid=AfmBOooONnd3a1lju0DgyhpdXlT1VtUp_skJdsx_uYH1DdHKLWPNe_DWBuY&com_cvv=8fb3d522dc163aeadb66e08cd7450cbbdddc64c6cf2e8891f6d48747c6d56d2c 
"""

This time the output is.

'Based on the provided URLs, here are the three cheapest prices for the 
Sony WH-1000XM5 headphones:nn1.  
**$145.00** at Reverb.n2. 
**$258.99** at Teds Electronics.n3.  
**$329.99** at Sony.'

Example 3 — Company financial analysis and comparisons.

In this example, we’ll compare the Quarter 2, 2025 earnings reports from both Amazon and Microsoft. We’ll ask the model to analyse both reports, extract key information and conclude with a summary indicating the key strengths and strategies of both companies. The data is once again being obtained from their public SEC 10-Q earnings reports.

from google import genai
from google.genai import types

from IPython.display import HTML, Markdown

client = genai.Client(api_key='YOUR_API_KEY_HERE')

MODEL_ID = "gemini-2.5-pro" 

microsoft_earnings_url = "https://www.sec.gov/ix?doc=/Archives/edgar/data/0000789019/000095017025100235/msft-20250630.htm"
amazon_earnings_url = "https://www.sec.gov/ix?doc=/Archives/edgar/data/0001018724/000101872425000086/amzn-20250630.htm"

# --- Step 3: Construct the Detailed, Non-Trivial Prompt ---
# This prompt guides the AI to perform a deep, comparative analysis
# rather than just a simple data extraction.

prompt = f"""
Please act as a senior financial analyst and provide a comparative analysis of the latest quarterly earnings reports for Amazon  and Microsoft.

Access and thoroughly analyse the content from the following two URLs:
1.  **Microsoft Earnings Report:** {microsoft_earnings_url}
2.  **Amazon's Earnings Report:** {amazon_earnings_url}

Based *only* on the information contained within these two documents, please perform the following tasks:

1.  **Extract and Compare Key Financial Metrics:**
    *   Identify and extract the Total Revenue, Net Income, and Diluted Earnings Per Share (EPS) for both companies.
    *   Present these core metrics in a clear, formatted markdown table for easy comparison.

2.  **Analyse and Summarise Management Commentary:**
    *   Review the sections containing quotes from the CEOs (Satya Nadella for Microsoft, Jeff Bezos for Amazon) and CFOs.
    * For each company, write a paragraph summarising the key themes they are emphasising. What are the primary drivers of their performance, according to them? What is the overall tone of their commentary (e.g., optimistic, cautious)?

3.  **Identify and Contrast Strategic Focus:**
    * Pinpoint the specific business segments or product categories that each company highlights as major growth drivers (e.g., Microsoft Cloud and AI, Amazon's AWS services, etc).
    *   Contrast their primary strategic focus for the quarter. Is one more focused on enterprise/cloud, while the other is more focused on consumer hardware and ecosystem growth?

4.  **Synthesise a Conclusive Executive Summary:**
    *   Write a final, concise paragraph that synthesises the findings. Compare the overall health and current strategic posture of the two companies based on these reports. For example, conclude which company demonstrated stronger growth in this specific quarter and in which areas.

Just output your final analysis. There is no need to output intervening steps thopughts or data
"""

config = {
    "tools": [{"url_context": {}}],
}

response = client.models.generate_content(
    contents=[prompt],
    model=MODEL_ID,
    config=config
)

display(Markdown(response.text))

Here is the output.

Here is a comparative analysis of the latest quarterly earnings reports for 
Amazon and Microsoft.

1. Key Financial Metrics
Metric          Amazon (Q2 2025)        Microsoft (FY25 Q2)
Total Revenue   143.7 billion            61 9 billion
Net Income      10.4 billion             21.9 billion 
Diluted EPS     1                        2.94

2. Management Commentary Analysis
Microsoft:

Microsoft's management, led by CEO Satya Nadella, maintains a highly 
optimistic tone, emphasizing the transformative impact of AI on their 
business. Nadella's commentary highlights the "Microsoft Cloud" as a primary 
performance driver, citing its role in helping customers apply AI at scale. 
The company's CFO, Amy Hood, reinforces this by noting that the strong 
quarterly results were driven by the Microsoft Cloud, which saw a 23% increase 
in revenue. The overarching theme from Microsoft's leadership is that their 
investments in AI are not just paying off but are fundamentally reshaping their
product offerings and customer relationships, leading to significant growth 
and market leadership.

Amazon:

Amazon's leadership, including CEO Jeff Bezos, presents a confident and 
forward-looking perspective. The commentary highlights the strong performance 
of Amazon Web Services (AWS) and the company's continued innovation in AI, 
particularly with services like Bedrock and the development of custom AI chips.
Bezos emphasizes the company's focus on long-term, customer-centric innovation.
The overall tone is one of robust optimism, with a clear message that Amazon's 
investments in AI and cloud computing are yielding substantial results and 
will continue to be central to their strategy.

3. Strategic Focus Contrast
Microsoft:

Microsoft's strategic focus for the quarter is heavily centered on its 
enterprise and cloud offerings, with a particular emphasis on artificial 
intelligence. The report repeatedly highlights the "Microsoft Cloud" as the 
cornerstone of its success, with specific mentions of products like Azure, 
Dynamics 365, and Microsoft 365, all of which are being infused with AI 
capabilities. The commentary from CEO Satya Nadella explicitly points to the 
company's success in helping customers apply AI at scale as a key 
differentiator and growth driver.

Amazon:

Amazon's strategic focus, while also heavily invested in cloud and AI through 
AWS, is broader, encompassing its vast e-commerce and consumer-facing 
businesses. The report underscores the growth of AWS, with a notable increase 
in revenue and operating income, and highlights the company's innovation in 
AI with services like Bedrock. However, unlike Microsoft, Amazon's report also 
points to the strength of its consumer business, with a focus on selection, 
price, and convenience, as well as the growth of its advertising services.

4. Executive Summary
Based on their latest quarterly reports, both Microsoft and Amazon are 
demonstrating strong financial health and are strategically positioned to 
capitalize on the growth of cloud computing and artificial intelligence. 
Microsoft's report indicates a more focused strategy, with the "Microsoft 
Cloud" and its integrated AI capabilities serving as the primary engine of 
growth, resulting in impressive profitability. Amazon, while also experiencing 
significant growth in its AWS segment, presents a more diversified strategy, 
balancing its enterprise cloud business with its massive e-commerce and 
advertising operations. In this specific quarter, Microsoft has shown stronger 
growth in net income and EPS, reflecting the high margins of its enterprise 
software and cloud business.

Summary

The article introduces Google’s new URL context grounding tool for Gemini, which allows developers to query and analyse the contents of specific web URLs (including PDFs) directly, without traditional Retrieval-Augmented Generation (RAG) steps like text extraction, chunking, and vectorisation. 

I demonstrated its ease of use with Python code examples running on Jupyter notebooks, showing successful retrieval of data from Tesla’s 10-Q SEC filing PDF, product price comparisons across online shops, and a financial analysis of Amazon and Microsoft’s Q2 2025 financial results. 

While noting limitations such as the tool not spporting paywalled URLs and some media content like YouTube videoas, I highlighted its ability to perform deep document interrogation, data extraction, comparison, and synthesis  on a wide variety of web pages and opnline PDFs – enhancing its accuracy by grounding responses in real sources. 

For many use cases, this tool effectively replaces traditional RAG workflows, particularly when combined with Google Search grounding to enable more sophisticated agentic workflows, factual reliability, and multimodal content analysis.

I hope this article has whetted your appetite for the myriad of use cases that this useful utility can offer.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *