UK's AI Safety Institute easily jailbreaks major LLMs

📆 5/20/2024 2:05 PM
📰 engadget

⏱ Reading Time:
34 sec. here
5 min. at publisher
📊 Quality Score:
News: 27%
Publisher: 63%

AI Safety Institute News

AISI,UK Prime Minister,AI Systems

Sarah Fielding MS, is an acclaimed journalist focusing on mental health, social issues, and tech. At Engadget, she reports on tech news, whether it be a Twitter bot exposing gender pay gaps or a beloved classic game's revival.

might not be as safe as their creators make them out to be — who saw that coming, right? In a, the UK government's AI Safety Institute found that the four undisclosed LLMs tested were "highly vulnerable to basic jailbreaks." Some unjailbroken models even generated "harmful outputs" without researchers attempting to produce them.

Most publicly available LLMs have certain safeguards built in to prevent them from generating harmful or illegal responses; jailbreaking simply means tricking the model into ignoring those safeguards.did this using prompts from a recent standardized evaluation framework as well as prompts it developed in-house. The models all responded to at least a few harmful questions even without a jailbreak attempt.

The AISI's report indicates that whatever safety measures these LLMs currently deploy are insufficient. The Institute plans to complete further testing on other AI models, and is developing more evaluations and metrics for each area of concern.

Write Comment

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Technology Technology Latest News, Technology Technology Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Britain expands AI safety institute to San Francisco amid scrutiny over regulatory shortcomingsThe U.K. said Monday it will open a U.S. counterpart to its AI safety summit, a state-backed body testing advanced AI systems, in San Francisco this summer.
Source: CNBC - 🏆 12. / 72 Read more »

LLMs can be easily manipulated for malicious purposes, research findsResearchers at AWS AI Labs, found that most publicly available LLMs can be easily manipulated into revealing harmful or unethical info.
Source: IntEngineering - 🏆 287. / 63 Read more »