AI ‘gold rush’ for chatbot training data could run out of human-written text

📆 6/6/2024 7:24 PM
📰 adndotcom

⏱ Reading Time:
76 sec. here
3 min. at publisher
📊 Quality Score:
News: 34%
Publisher: 63%

Technology Technology Headlines News

Technology Technology Latest News,Technology Technology Headlines

A new study projects that tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade.

By Matt O'Brien, Associated PressTraffic on Interstate 35 passes a Microsoft data center on Sept. 5, 2023, in West Des Moines, Iowa.

In the short term, tech companies like ChatGPT-maker OpenAI and Google are racing to secure and sometimes pay for high-quality data sources to train their AI large language models – for instance, by signing deals to tap into the steady flow of sentences coming out of Reddit forums and news media outlets.

The researchers first made their projections two years ago — shortly before ChatGPT’s debut — in a working paper that forecast a more imminent 2026 cutoff of high-quality text data. Much has changed since then, including new techniques that enabled AI researchers to make better use of the data they already have and sometimes “overtrain” on the same sources multiple times.

The amount of text data fed into AI language models has been growing about 2.5 times per year, while computing has grown about 4 times per year, according to the Epoch study. Facebook parent company Meta Platforms recently claimed the largest version of their upcoming Llama 3 model — which has not yet been released — has been trained on up to 15 trillion tokens, each of which can represent a piece of a word.

If real human-crafted sentences remain a critical AI data source, those who are stewards of the most sought-after troves — websites like Reddit and Wikipedia, as well as news and book publishers — have been forced to think hard about how they’re being used. AI companies should be “concerned about how human-generated content continues to exist and continues to be accessible,” she said.

Write Comment

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Technology Technology Latest News, Technology Technology Headlines