AMD predicts future AI PCs will run 30B parameter models at 100 tokens per second

  • 📰 TheRegister
  • ⏱ Reading Time:
  • 47 sec. here
  • 2 min. at publisher
  • 📊 Quality Score:
  • News: 22%
  • Publisher: 61%

Technology Technology Headlines News

Technology Technology Latest News,Technology Technology Headlines

They're gonna need a heck of a lot of memory bandwidth – not to mention capacity – to do it

Within a few years, AMD expects to have notebook chips capable of running 30-billion-parameter large-language models locally at a speedy 100 tokens per second.

In this regard, LLM performance on Strix Point is limited in large part by its 128-bit memory bus – which, when paired with LPDDR5x, is good for somewhere in the neighborhood of 120-135 GBps of bandwidth depending on how fast your memory is. When it comes to demonstrating the value of AI PCs, AMD is leaning heavily on its software partners. With products like Strix Point, that largely means Microsoft."When Strix initially started, what we had was this deep collaboration with Microsoft that really drove, to some extent, our bounding box," he recalled.

The good news is there are quite a few ways to do just that – depending on whether you're trying to prioritize memory bandwidth or capacity.One potential approach is to use a mixture of experts model along the lines of Mistral AI's Mixtral. These MoEs are essentially a bundle of smaller models that work in conjunction with one another.

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 67. in TECHNOLOGY

Technology Technology Latest News, Technology Technology Headlines