Judge Rules AI Training on Pirated Books is Fair Use, Setting Precedent for Tech Companies

In a landmark decision, a federal judge has ruled that Anthropic's use of published books to train its AI models without the authors' permission is legal. This ruling is the first of its kind, recognizing the claim of AI companies that the fair use doctrine can exempt them from liability when using copyrighted materials to train large language models (LLMs).
This decision is a significant setback for authors, artists, and publishers who have filed numerous lawsuits against companies like OpenAI, Meta, Midjourney, and Google. While this ruling does not guarantee that other judges will follow suit, it sets a precedent in favor of tech companies over creators. The lawsuits typically hinge on how judges interpret the fair use doctrine, a notoriously vague exception in copyright law that has not been updated since 1976—long before the advent of the internet and the concept of training datasets for generative AI.
Fair use determinations consider several factors, including the purpose of the work's use, whether the copying is for commercial gain, and the degree of transformativeness of the derivative work compared to the original. For instance, writing fan fiction about Star Wars for personal enjoyment is generally considered fair use, but selling it for profit is not. Companies like Meta have made similar fair use arguments in defending their use of copyrighted works for training. However, before this week's ruling, it was unclear how the courts would decide.
In the specific case of Bartz v. Anthropic, the plaintiff authors questioned Anthropic's methods of obtaining and storing their works. According to the lawsuit, Anthropic attempted to create a "central library" containing "all the books in the world" and storing them "permanently." However, these millions of copyrighted books were downloaded for free from pirate sites, which is clearly illegal. Despite this, the judge acknowledged that Anthropic's training on these materials constituted fair use. The court will proceed to judge the nature of the "central library."
The judge stated, "We will judge the pirated copies used to create Anthropic's central library and the harm they caused. Anthropic's later purchase of a book previously stolen from the internet does not absolve it of theft, but it may affect the amount of statutory damages." This ruling underscores the complex interplay between copyright law and the rapidly evolving field of AI, setting a precedent that could shape future legal battles in this area.

Comments
No comments yet