Google Now Wants To Pay Publishers For Their Content To Train Its AI

According to Bloomberg, Google is engaging with news publishers to establish AI licensing agreements, initiating a pilot project with approximately 20 national news outlets to license their content for Google’s artificial intelligence tools. This development occurs as the relationship between the media and AI industries becomes increasingly complex.

The reported pilot project from Google aligns with a strategy previously employed by OpenAI. Over the past several years, OpenAI has secured licensing deals with various prominent publishers, including Hearst, Condé Nast, Vox Media, The Atlantic, and News Corp. Perplexity also maintains a significant number of brokered deals with publishers in this evolving landscape.

In parallel, numerous publishers indicate that the advent of AI tools, such as ChatGPT, Google AI Overviews, and Google AI Mode, has resulted in substantial declines in website traffic. The Wall Street Journal characterized this situation as “AI Armageddon” for online news publishers, asserting that they are being “crushed” by Google’s AI search functionalities. The Economist further underscored this sentiment in a recent article, depicting a gravestone illustration and stating, “AI is killing the web.”

Google has already established an AI licensing partnership with The Associated Press, facilitating real-time news updates for its Gemini model. Additionally, Google holds a $60 million licensing agreement with Reddit. The proposed pilot project, however, signifies a notable expansion of Google’s licensing strategy within the publishing sector. A Google spokesperson stated, “We’ve said that we’re exploring and experimenting with new types of partnerships and product experiences, but we aren’t sharing details about specific plans or conversations at this time.”

The publishing industry faces a critical decision regarding the utilization of their content for training AI models. AI company bots systematically scrape vast amounts of internet content for valuable training data, which is subsequently fed into large language models (LLMs) to formulate chatbot responses. This practice has generated divergent responses from content creators.

Some publishers and authors have initiated legal action, alleging copyright infringement due to the use of their content without explicit permission or compensation. The New York Times is currently engaged in a lawsuit against OpenAI and Microsoft concerning this issue. Ziff Davis, the parent company of Mashable, also filed a lawsuit against OpenAI in April, alleging copyright infringement in the training and operation of its AI systems.

Other publishers have adopted a different approach, agreeing to license their content. These entities often cite the potential for new avenues through which readers can discover their stories. While the specific terms of licensing deals with OpenAI have not been publicly disclosed, reports indicate varying financial arrangements. Dotdash Meredith is reportedly receiving $16 million annually from a licensing deal, while a report from The Information suggested that some publishers are receiving as little as $1 million per year.

The legal validity of tech companies’ claims that using scraped content is protected by the fair use doctrine remains unresolved in judicial proceedings. Although Anthropic and Meta recently prevailed in cases against authors using the fair use argument, a pre-publication version of a U.S. Copyright Office report on AI generally favored copyright holders concerning AI training. While courts continue to deliberate on specific fair use cases, the expanding AI licensing market may indicate an acknowledgment by tech companies of the need for collaboration with publishers in exchange for access to high-quality data.

Simultaneously, the introduction of Google’s AI-generated summaries and AI Mode continues to impede outbound traffic to publishers’ sites, according to numerous accounts from publishing entities. Instead of users navigating to external websites from Google search results, information is directly presented by Google’s AI models on the search page. On its AI site, Google states that it is “engaging with the ecosystem to explore new types of partnership and value-exchange models.”

Featured image credit

Google Now Wants To Pay Publishers For Their Content To Train Its AI

Stay Ahead of the Curve!

Related Posts

How Not to Mislead with Your Data-Driven Story

Torchvista: Building an Interactive Pytorch Visualization Package for Notebooks

Leave a Reply Cancel reply