A rising number of copyright infringement cases against artificial intelligence (AI) have made it progressively difficult to track these legal battles. The recent lawsuit filed last week sees authors pitted against tech giant NVIDIA. The authors contend that NVIDIA used their books without permission for NeMo, its AI platform that empowers enterprises to develop and train their chatbots, as reported by Ars Technica.
The authors, Abdi Nazemian, Brian Keene, and Stewart O’Nan, have requested a trial by jury. They are demanding that NVIDIA pay damages and eradicate all copies of the Books3 dataset, which they assert was used to fuel the large language models (LLMs) of NeMo. Furthermore, they assert this dataset contains a ‘shadow library’ known as Bibliotek that was made up of 196,640 illegally copied books.
In their claim, the authors state “NVIDIA has confirmed that it trained its NeMo Megatron models on a duplicate of ‘The Pile’ dataset. As a result, because ‘Books3’ is part of ‘The Pile,’ NVIDIA must have also trained its NeMo Megatron models on a copy of ‘Books3.’ Some of the books written by the plaintiffs were included in ‘Books3,’ making NVIDIA liable for copyright infringement through the training of its NeMo Megatron models on their works.”
NVIDIA’s response to these accusations, as given to The Wall Street Journal was a firm denial, “We respect the rights of all content creators and believe we created NeMo in full compliance with copyright law.”
This lawsuit comes on the heels of similar legal action being taken by non-fiction authors last year against OpenAI and Microsoft. They accused the tech companies of profiting from their works without compensation. Additional legal battles have been opened since then, including claims from news outlets such as The Intercept and Raw Story, as well as a pioneering lawsuit filed by The New York Times.