New Court Filings Suggest Meta’s AI Training Data Licensing Efforts Faced Significant Challenges
New court filings in the Kadrey v. Meta Platforms copyright case are shedding light on Meta’s struggles to secure licensing deals with book publishers for AI training data. These filings corroborate earlier reports that Meta “paused” discussions with publishers, suggesting that the company encountered significant hurdles in its efforts to acquire copyrighted material for its generative AI models.
Background: The Kadrey v. Meta Case and the “Fair Use” Debate
The Kadrey v. Meta Platforms case is one of several legal battles pitting AI companies against authors and intellectual property holders. The core issue revolves around the use of copyrighted material for AI training. AI companies, including Meta, have largely argued that training on copyrighted content constitutes “fair use,” a legal doctrine that allows limited use of copyrighted material without permission from the copyright holders. Authors and other copyright holders, the plaintiffs in these cases, strongly contest this claim.
Court Filings Reveal Meta’s Licensing Struggles
Newly filed documents, including partial transcripts of Meta employee depositions, paint a picture of Meta’s difficulties in securing book licensing agreements. According to these transcripts, certain Meta staff members expressed concerns about the scalability of negotiating such licenses.
Low Engagement and Lack of Publisher Rights
Sy Choudhury, who leads Meta’s AI partnership initiatives, testified that the company’s outreach to publishers was met with “very slow uptake in engagement and interest.” Despite compiling a “long list” of potential publishers, Meta struggled to even establish contact. Choudhury stated that only a small fraction of publishers engaged with Meta’s inquiries.
Furthermore, the transcripts reveal that Meta paused its book licensing efforts in early April 2023 due to “timing” and other logistical challenges. A key issue was that many fiction publishers, a particular target for Meta, did not actually possess the rights to license the content Meta sought. Choudhury explained that these publishers often represented that they lacked the necessary rights, making it necessary to engage with individual authors, a process deemed too time-consuming.
Echoes of Previous Licensing Challenges
Choudhury’s testimony also indicated that Meta had encountered similar licensing roadblocks in other areas of AI development. He cited the example of attempting to license 3D worlds from game engine and game manufacturers. Faced with similarly low engagement and lack of interest, Meta opted to develop its own solutions rather than pursue licensing agreements.
Allegations of Piracy and “Shadow Libraries”
The plaintiffs, including bestselling authors Sarah Silverman and Ta-Nehisi Coates, have amended their complaint multiple times. The latest version accuses Meta of several offenses, including cross-referencing pirated books with copyrighted books available for licensing. This allegedly helped Meta determine whether pursuing a licensing agreement with a publisher was worthwhile.
The complaint further alleges that Meta used “shadow libraries” containing pirated e-books to train its AI models, including the Llama series. The plaintiffs claim that Meta may have acquired these libraries through torrenting, a method of distributing files that often involves sharing copyrighted material illegally. These allegations paint a picture of Meta potentially relying on illegally obtained data to train its AI models, further complicating the legal landscape of AI copyright.
Conclusion: Implications for the Future of AI Training Data
The revelations from these court filings highlight the challenges AI companies face in acquiring training data legally. Meta’s struggles to secure licensing agreements and the allegations of using pirated material underscore the complexities of navigating copyright law in the rapidly evolving field of artificial intelligence. These cases could have significant implications for the future of AI development and the legal framework governing the use of copyrighted material for AI training.
References: