that the latest AI controversy revolves around the use of copyrighted literary works to train sophisticated language models. These chatbots, designed to mimic human-like responses, rely on vast amounts of written content. But the sources of these training materials have largely remained a mystery, raising eyebrows and concerns in the literary community.
Facebook CEO Mark Zuckerberg arrives for the 8th annual Breakthrough Prize awards ceremony at NASA Ames Research Center in Mountain View, California on November 3, 2019. Stephen King, Zadie Smith, and Michael Pollan are among a growing list of authors whose works, they claim, have been used without permission. The essence of the debate is not just about copyright infringement but also about the transparency and ethics surrounding the development of AI.
Upwards of 170,000 books, the majority published in the past 20 years, are in LLaMA’s training data. In addition to work by Silverman, Kadrey, and Golden, nonfiction by Michael Pollan, Rebecca Solnit, and Jon Krakauer is being used, as are thrillers by James Patterson and Stephen King and other fiction by George Saunders, Zadie Smith, and Junot Díaz. These books are part of a dataset called “Books3,” and its use has not been limited to LLaMA.
Despite the fact that Facebook’s LLaMA AI model is seemingly trained with copyrighted material, the company takes a harsh stance on others using its own copyrighted material. TheMeta’s proprietary stance with LLaMA suggests that the company thinks similarly about its own work.