AI Developers Are Quietly Training AI Using AI-Generated Data

  • 📰 futurism
  • ⏱ Reading Time:
  • 33 sec. here
  • 2 min. at publisher
  • 📊 Quality Score:
  • News: 17%
  • Publisher: 68%

Technology Technology Headlines News

Technology Technology Latest News,Technology Technology Headlines

It's AI-generated, all the way down.

are increasingly investigating what's known as "synthetic data" to train their large language models for a number of reasons, not least of which being that it's apparently more cost-effective.Beyond the relative cheapness of synthetic data, however, is the scale issue. Training cutting-edge LLMs starts to use essentially all the human-created data that's actually available, meaning that to build even stronger ones, they're almost certainly going to need more.

"If you could get all the data that you needed off the web, that would be fantastic," Gomez said. "In reality, the web is so noisy and messy that it’s not really representative of the data that you want. The web just doesn’t do everything we need."As the CEO noted, Cohere and other companies are already quietly using synthetic data to train their LLMs "even if it’s not broadcast widely," and others like OpenAI seem to expect to use it in the future.

During an event in May, OpenAI CEO Sam Altman quipped that he is "pretty confident that soon all data will be synthetic data," the report notes, and

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 85. in TECHNOLOGY

Technology Technology Latest News, Technology Technology Headlines