Anthropic Faces Trial Over Pirated AI Training Books

A class-action lawsuit against Anthropic, alleging the company used pirated books from shadow libraries like LibGen and PiLiMi to train its AI models, could result in substantial copyright damages, with a federal judge ruling on separate trials for acquisition and training, reports Fortune.

Anthropic, a leading artificial intelligence laboratory, faces a legal challenge that could significantly impact its financial standing. The class-action lawsuit centers on the company’s alleged utilization of pirated books to train its large language model, Claude. This legal action could lead to billions of dollars in damages.

Court filings indicate that Anthropic downloaded millions of copyrighted works from shadow libraries such as LibGen and PiLiMi. These downloads were purportedly used to train AI models and to construct a “central library” of digital books, intended to encompass “all the books in the world” and preserve them indefinitely. The plaintiffs, including authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, contend that millions of these works were obtained from piracy websites, constituting direct violations of copyright law.

Judge William Alsup, presiding over the case, recently ruled that training AI models on lawfully acquired books qualifies as “fair use.” This decision means AI companies do not require a license from copyright holders for such training, a development widely regarded as a significant victory for the AI sector. However, the unresolved aspect of the case pertains to Anthropic’s methods of acquiring and storing the copyrighted books. Judge Alsup distinguished between the use of lawfully acquired materials and pirated content, informing Anthropic that a separate trial addressing “the pirated copies” and “the resulting damages” would proceed.

Luke McDonagh, an associate professor of law at LSE, commented on the distinction between lawfully obtained and pirated materials. He stated, “The problem is that a lot of these AI companies have scraped piracy sites like LibGen… where books have been uploaded in electronic form, usually PDF, without the permission of the authors, without payment.” McDonagh elaborated that the judge’s perspective suggests that if Anthropic had purchased millions of digital books from a legitimate source like Amazon, the training based on those books would be legal. He emphasized that the act of downloading from pirate websites constitutes the core problem, as it involves both the acquisition of an unauthorized copy and its subsequent use.

Ed Lee, a law professor at Santa Clara, suggested in a blog post that Judge Alsup’s ruling could expose Anthropic to “at least the potential for business-ending liability.” The plaintiffs are unlikely to demonstrate direct financial harm, such as lost sales, and are therefore expected to pursue statutory damages. Statutory damages range from $750 to $150,000 per work. The specific amount depends heavily on whether the infringement is determined to be willful. If the court concludes that Anthropic knowingly violated copyright law, the resulting fines could be substantial, potentially reaching billions of dollars even at the lower end of the statutory damage scale.

Anthropic now wants you to use its AI when you apply for a job

The precise number of works included in the class action and whether a jury finds willful infringement remain undetermined. However, potential damages could range from hundreds of millions to tens of billions of dollars. Lee posits that even at the lower end, damages between $1 billion and $3 billion are plausible if only 100,000 works are included in the class action. This figure would rival the largest copyright damage awards on record and could significantly exceed Anthropic’s current annual revenue of $4 billion. Lee further estimated that if a jury determines Anthropic willfully pirated 6 million copyrighted books, the company could face liability of up to $1.05 trillion.

Anthropic has not provided an immediate comment in response to inquiries. However, the company has previously stated its “respectful disagreement” with the court’s decision. Anthropic is exploring its options, which include potentially appealing Judge Alsup’s ruling or seeking to settle the case. The trial, notable as the first certified class action against an AI company concerning the use of copyrighted materials, is scheduled for December 1. The outcome of this verdict could influence similar ongoing legal disputes, such as the high-profile case involving OpenAI and a number of authors and publishers.

While judicial opinions appear to favor fair use arguments for AI companies, a legal divergence exists regarding the acquisition of copyrighted works from unauthorized shadow sites. In a recent copyright case against Meta, Judge Vince Chhabria posited that the transformative purpose of AI use effectively legitimizes earlier unauthorized downloading. McDonagh explained that Judge Chhabria’s ruling suggested that the positive, transformative application of the works could “correct” the initial problematic acquisition. In contrast, Judge Alsup views the downloading of books from unauthorized shadow libraries as “inherently wrong.” He implies that even if the AI training use might be considered fair use, the initial acquisition of the works was illegitimate and would necessitate compensation.

A further point of divergence between the two judges concerns whether AI-generated outputs could be considered competitive with the original copyrighted works used in their training data. Judge Chhabria acknowledged that if such competition were proven, it might undermine a fair use defense. However, in the Meta case, he determined that the plaintiffs had failed to provide sufficient evidence of market harm. Conversely, Judge Alsup concluded that generative AI outputs do not compete with the original works at all.

The legal landscape surrounding AI companies and copyrighted works has also become increasingly politicized. The current administration advocates for broad fair use protections for AI companies using copyrighted materials for training. This stance is part of an effort to maintain U.S. leadership in artificial intelligence development.

McDonagh expressed the view that the lawsuit against Anthropic is unlikely to result in the company’s bankruptcy. He suggested that the U.S. government would be unlikely to permit a ruling that would effectively dismantle an AI company. Additionally, he noted that judges generally demonstrate an aversion to issuing rulings that could lead to bankruptcy, unless a strong legal basis necessitates such an outcome. Courts have been observed to consider the potential impact on a company and its stakeholders when issuing rulings that could result in liquidation. McDonagh stated, “The U.S. Supreme Court, at the moment, seems quite friendly to the Trump agenda, so it’s quite likely that in the end, this wouldn’t have been the kind of doomsday scenario of the copyright ruling bankrupting Anthropic.” He added, “Anthropic is now valued, depending on different estimates, between $60 and $100 billion. So paying a couple of billion to the authors would by no means bankrupt the organization.”

Featured image credit

Anthropic Faces Trial Over Pirated AI Training Books

Stay Ahead of the Curve!

Related Posts

How to Design Machine Learning Experiments — the Right Way

How to Write Insightful Technical Articles

Leave a Reply Cancel reply