A team of computer scientists, engineers, mathematicians and cognitive scientists developed an open-source evaluation platform called CheckMate, which allows human users to interact with and evaluate the performance of large language models .
The researchers suggest models that communicate uncertainty, respond well to user corrections, and can provide a concise rationale for their recommendations, make better assistants. Human users of LLMs should verify their outputs carefully, given their current shortcomings., could be useful in both informing AI literacy training, and help developers improve LLMs for a wider range of uses.
"When talking to mathematicians about LLMs, many of them fall into one of two main camps: either they think that LLMs can produce complex mathematical proofs on their own, or that LLMs are incapable of simple arithmetic," said co-first author Katie Collins from the Department of Engineering."Of course, the truth is probably somewhere in between, but we wanted to find a way of evaluating which tasks LLMs are suitable for and which they aren't.
"One of the things we found is the surprising fallibility of these models," said Collins."Sometimes, these LLMs will be really good at higher-level mathematics, and then they'll fail at something far simpler. It shows that it's vital to think carefully about how to use LLMs effectively and appropriately."
Technology Technology Latest News, Technology Technology Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Source: ForbesTech - 🏆 318. / 59 Read more »
Source: physorg_com - 🏆 388. / 55 Read more »
Source: sdut - 🏆 5. / 95 Read more »
Source: BGR - 🏆 234. / 63 Read more »