It's getting hard to keep up with copyright lawsuits against generative AI, with a new proposed class action hitting the courts last week. This time, authors are suing NVIDIA over itsAuthors Abdi Nazemian, Brian Keene and Stewart O’Nan demanded a jury trial and asked NVIDIA to pay damages and destroy all copies of the Books3 dataset used to power NeMo large language models . They claim that dataset copied a shadow library called Bibliotek consisting of 196,640 pirated books.
"In sum, NVIDIA has admitted training its NeMo Megatron models on a copy of The Pile dataset," the claim states. "Therefore, NVIDIA necessarily also trained its NeMo Megatron models on a copy of Books3, because Books3 is part of The Pile.