Mustafa Suleyman, the CEO of Microsoft AI, said this week that machine-learning companies can scrape most content published online and use it to train neural networks because it's essentially"freeware."in January alleging that they trained AI models on the authors' works without permission.
Suleyman did allow that there's another category of content, the stuff published by companies with lawyers. That's putting it mildly. While Suleyman's remarks seem certain to offend content creators, he's not entirely wrong – it's not clear where the legal lines are with regard to AI model training and model output.
In other words, those creating content and posting it online make freeware unless they retain, or can attract, attorneys willing to challenge Microsoft and its ilk.distributed via SSRN last month, Frank Pasquale, professor of law at Cornell Tech and Cornell Law School in the US, and Haochen Sun, associate professor of law at The University of Hong Kong, explore the legal uncertainty surrounding the use of copyrighted data to train AI and whether courts will find such use fair.