Adobe Faces Lawsuit Over AI Training with Pirated Books

Adobe is the latest tech giant to face legal scrutiny over its AI practices, as a proposed class-action lawsuit alleges the company used pirated books to train its SlimLM model. The lawsuit, filed by author Elizabeth Lyon, claims that Adobe's AI was trained on copyrighted material without permission, including Lyon's own works. This case adds to a growing trend of legal challenges targeting the use of unlicensed data in AI development, raising questions about the ethical and legal implications of training AI models.

The lawsuit centers around SlimLM, a language model designed by Adobe for document-related tasks on mobile devices. According to the complaint, SlimLM was trained using the SlimPajama-627B dataset, which was released by Cerebras in June 2023. This dataset is reportedly derived from RedPajama, a larger dataset that includes Books3—a collection of over 190,000 books that has been the subject of multiple copyright lawsuits involving companies like Apple, Salesforce, and now Adobe.

The core issue highlighted by the lawsuit is the use of unlicensed content in AI training. As AI models require vast amounts of data to function effectively, many companies have turned to open web datasets, which often include copyrighted material. The legal battle underscores the tension between the need for data and the rights of content creators, as well as the potential reputational and legal risks for companies that rely on AI tools.

The Rise of AI-Related Copyright Lawsuits

This is not the first time a major tech company has faced legal action over the sourcing of AI training data. In recent months, Apple, Salesforce, and Anthropic have all been involved in similar disputes. Anthropic, for instance, agreed to pay US$1.5 billion to settle claims that it used pirated works to train its chatbot, Claude. These cases reflect a broader trend of increased scrutiny over how AI models are developed and the ethical considerations surrounding their training data.

For marketers and content creators who use AI tools to streamline campaigns, this lawsuit serves as a cautionary tale. It highlights the importance of understanding the origins of AI training data and the potential liabilities associated with using tools that rely on unlicensed content. As AI becomes more integral to marketing strategies, companies must take proactive steps to ensure they are using AI responsibly and ethically.

Implications for the AI Industry

The Adobe lawsuit is part of a larger conversation about the future of AI and the need for greater transparency in its development. As AI becomes more ubiquitous, the pressure to address the ethical and legal challenges associated with its training data will only intensify. Companies must prioritize responsible AI use, including verifying the licensing status of training data and implementing guidelines for transparency and attribution.

Marketers, in particular, should review their AI tools and vendors, ensuring they are using properly licensed data. This includes asking vendors about their training methods, documenting AI usage in content production, and establishing internal guidelines for ethical AI use. By taking these steps, companies can mitigate legal risks and build trust with their audiences.

Adobe has not yet released a public statement regarding the lawsuit. However, the case underscores the growing need for the AI industry to address the complexities of data sourcing and ensure that innovation does not come at the expense of creators' rights.