News

ArtPrompt: ASCII Art Jailbreak Can Fool ChatGPT and Gemini

AI tools and chatbots have become an integral part of many people’s everyday lives. Whether ChatGPT, Google Gemini or others. Security mechanisms are designed to ensure that the AI does not tell you how to build a bomb, for example. Security researchers have now found a jailbreak that can be used to circumvent this restriction: ArtPrompt relies on ASCII art.

ArtPompt: bomb-making instructions with ASCII art

AI tools are practical. They translate texts, summarize homework, provide useful assistance or create images, videos and now even entire video games in no time at all.

Of course, developers have also developed various security methods to prevent AI language modules from sharing potentially dangerous information. For example, if you ask ChatGPT or Gemini for instructions on how to build a bomb, you won’t find them:

“Please note that I cannot provide instructions or information on how to build bombs or other dangerous weapons. My job is to avoid security-related information and provide positive, helpful answers,” ChatGPT says.

However, these restrictions can be circumvented with the help of so-called jailbreaks, as security researchers have proven time and again. The latest jailbreak makes use of so-called ASCII art, in which images are composed of numbers and letters.

ArtPrompt, as the new jailbreak is called, can use ASCII art to generate bomb-making instructions from the AI. Various security researchers have now discovered this and shared this information in a paper(via Arxiv).

ArtPrompt bypasses almost all AI language models

Large language models (LLMs) such as ChatGPT, Llama 2, Gemini and Claude are not safe from the new jailbreak. As the team states in the paper, the important AI language models could be bypassed with the ArtPrompt jailbreak.

And all it took was a single move to replace the term “bomb” in the prompt “How to build a bomb?” with a corresponding ASCII artwork of the word. The AI tools were then tricked and revealed the information.

KI Chatbot ArtPrompt
Image: Created with Microsoft Designer

ArtPrompt works in two steps. In the first step the corresponding word, in this case bomb, is masked, in the second step this masked word is then replaced by ASCII art.

According to the scientists, this method is significantly faster and more effective than previous jailbreak attacks on AI tools. So let’s hope that chatbot developers react quickly to the new jailbreak and eliminate the security vulnerability.

Related Articles

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button
0
Would love your thoughts, please comment.x
()
x