“It was the best of times, it was the worst of times.” Charles Dickens' famous opening line in his novel A Tale of Two Cities seems an appropriate metaphor for today’s legal and creative landscape regarding generative AI tools.
On the one hand, GenAI offers a powerful tool that can unleash creativity in ways we cannot even conceive today. On the other hand, the training of AI models on copyrighted content scraped from the internet is creating a showdown between AI model developers and content creators. Recently, two judges in the Northern District of California issued Orders granting Summary Judgment on the “fair use” defense in the defendant’s favor. While both judges found that the copying conducted by the AI model developers was a “fair use” under US Copyright law, the two judges seem to fundamentally disagree on whether AI training on copyright works is generally a permissible “fair use.”
Bartz, et. al. v. Anthropic PBC, 3:24-cv-50417
In this case, Judge Alsup considered whether Anthropic’s creation of digital copies of actual books, the retention of the digital copies in a digital repository, and the training of its AI model on the digital copies was a “fair use”. Anthropic had two sources for its books. First, Anthropic downloaded millions of books from pirated libraries and retained copies of these downloaded books. However, none of the pirated copies were used in training its AI model. Second, Anthropic purchased millions of actual books and made digital copies of the books then used those digital copies to train its LLM. Judge Alsup’s “fair use” analysis applied the four “fair use” factors to the two sources of books separately (i.e., pirated v. purchased). The four non-exclusive “fair use” factors set out in Section 107 of the US Copyright Act are:
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
In analyzing these four factors, Judge Alsup found the first factor weighed heavily in favor of Anthropic’s use of the purchased books to train its LLM. Alsup concluded that the digitization of the books, the retention of the digital copies in a central library and using the digital copies of the purchased books to train Anthropic’s LLM to be “quintessentially transformative.” In contrast, the court held that the first factor weighed heavily in favor of the authors because the downloading and retention of the pirated copies (even those for which Anthropic later purchased an authorized copy) was an infringement that could not be cured by the fact that ultimate use may have been a fair use.
As to the nature of the copyrighted works, the court found this factor weighed in favor of the authors. The court found that all parties agreed that the copied works all contained expressive elements that deserved, in varying degrees, copyright protection.
In looking at the third factor, the court analyzed the copies used to train the LLM and the copies retained in the digital library separately. Judge Alsup held that the third factor weighed in favor of Anthropic as to the copies used to train the LLM. Specifically, the court found that the author’s arguments as to this group of books were misguided because “the amount and substantiality of the portion used” looks at what portion was used that could replace the original works. Here, the authors had no evidence that their works were ever replicated by the LLM in a large enough quantity to complete with the original works. Similarly, the court held that the third fact weighed in favor of Anthropic regarding the retention of the books Anthropic purchased, digitized (destroying the original copies) and retained in a central digital library. However, the court held that the third factor weighed in favor of the authors regarding the retention of the pirated copies in the digital library. The court reasoned that Anthropic never had the right to hold the pirated books at all since the purpose of retaining those books was not to train the LLM, but to assemble a library of all the world’s books.
Finally, the court concluded that the fourth factor favored a finding of fair use for the copies used to train the LLM. The court reasoned that the author’s argument concerning the training copies was misplaced. The authors argued the LLM could result in a flood of competing content on the market, thereby lowering the value of the original works. However, the court held that the Copyright Act is not designed to protect creators from competition, but from unfair competition (e.g., the unauthorized distribution of a copy or a derivative work). As for the purchased books digitized and retained in the library, the court held that this factor was neutral to the determination of fair use and found that this factor points against fair use as to the pirated copies retained in the central library because in this instance it was copy for copy and clearly a displacement of the original work.
Kadrey, et. al v. Meta Platforms, Inc., 23-cv-0417-VC
In a second case out of the Northern District of California, Judge Chhabria found that the AI model training was a fair use under the facts of this specific case, but strongly hinted in dicta that this case was an exception and that in the future (with better lawyers representing plaintiffs) AI model training would not be found to be a fair use. Like Judge Alsup’s opinion in Anthropic above, Judge Chharbria found that the first fair use factor favored Meta because the use was transformative. He also found that the second factor weighed in favor of the plaintiffs and that the third factor pointed to fair use. However, it was on the fourth factor that Judge Chharbria parted ways with Judge Alsup’s reasoning.
While Judge Chhabria held that the fourth factor (effect of the use on the market) favored Meta in this case, he expounded at length in his analysis that, in his opinion, this factor should, in most cases, favor a copyright owner and defeat the fair use defense. Despite there being a plethora of case law that states that no single fair use factor is determinative, Judge Chharbria reasoned that the fourth factor is “the single most important element of fair use” citing to Harper & Row a 40-year-old Supreme Court case. Unfortunately, this conclusion totally ignores the Supreme Court’s admonition in Warhol v. Goldsmith. In Warhol, the majority stated, “The Court has cautioned that the four statutory fair use factors may not ‘be treated in isolation, one from another. All are to be explored, and the results weighed together, considering the purposes of copyright.’” citing Acuff v. Campbell.
After summarily dismissing the plaintiffs’ claims that the AI training harms the market of their original works and forecloses the authors’ ability to license their works for training, Judge Chhabria goes on at length about the possibility that AI generated works could flood the market with content like the authors’ works and, thereby, devalue the original works. This new “market dilution” theory is touted by Judge Chhabria as the key evidence that will generally point the fourth fair use factor in future plaintiffs’ favor. However, as Judge Alsup correctly pointed out in the Anthropic case, “This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.” Citing Sega v. Accolade, 977 F.2d 1510 (9th Cir. 1992).
Conclusion
It is clear from these cases that, at least currently, the fair use analysis will greatly depend on the evidence put forth by the parties. It is also clear that US courts are grappling with applying copyright law to AI/ML training on scraped content. Clearly, this issue will not be resolved until the US Supreme Court considers the issue and provides guidance to the lower courts. In the meantime, this author believes that, of the two opinions, Judge Alsup’s analysis of the four fair use factors applied to AI/ML model training is the better reasoned and correct decision.