Adobe’s aggressive push into artificial intelligence has landed it in legal trouble. A proposed class-action lawsuit alleges that the software giant trained one of its AI models using pirated books, including works written by the plaintiff.
The lawsuit was filed on behalf of Elizabeth Lyon, an author based in Oregon, and claims that Adobe used unauthorised copies of numerous books to train its SlimLM language model. According to the complaint, Lyon’s own writing was included in the training data without her consent.
Adobe describes SlimLM as a family of small language models designed for document assistance tasks, particularly on mobile devices. The company has said SlimLM was pre-trained using SlimPajama-627B, an open-source dataset released by AI chipmaker Cerebras in June 2023. That dataset is described as a large, deduplicated collection drawn from multiple sources.
Lyon’s lawsuit, first reported by Reuters, challenges that explanation. It argues that SlimPajama is itself derived from another dataset, RedPajama, which in turn incorporates the controversial Books3 collection. Books3 is a massive archive of roughly 191,000 books that has been widely used to train generative AI models and has become a flashpoint in copyright disputes.
According to the filing, SlimPajama was created by copying and modifying RedPajama, including material from Books3. Because of that lineage, the lawsuit claims SlimPajama contains copyrighted works belonging to Lyon and other authors, making Adobe’s use of the dataset unlawful.
Books3 and RedPajama have already appeared in several high-profile cases. In September, authors sued Apple, alleging that the company used copyrighted material to train its Apple Intelligence models without consent, credit, or compensation. A month later, Salesforce was hit with a similar lawsuit that also referenced RedPajama as a training source.
These cases reflect a broader legal reckoning for the AI industry. Large language models rely on enormous datasets, and questions about how those datasets were assembled are increasingly ending up in court. In one of the most significant cases to date, Anthropic agreed in September to pay $1.5 billion to settle claims from authors who accused the company of using pirated versions of their books to train its Claude chatbot.
That settlement was widely seen as a potential turning point, signalling that courts and companies may no longer treat copyright concerns around AI training as theoretical. If the case against Adobe proceeds, it could add further pressure on tech firms to clearly account for where their training data comes from, and whether they have the rights to use it.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.