Close Menu
Indo Guard OnlineIndo Guard Online
  • Home
  • Cyber Security
  • Risk Management
  • Travel
  • Security News
  • Tech
  • More
    • Data Privacy
    • Data Protection
    • Global Security
What's Hot

Malicious NPM packages infect 3200+ users cursor with back, theft of credentials

May 9, 2025

Deployment of AI agents? Learn to provide them before the hackers have contributed to your business

May 9, 2025

Initial Access brokers

May 9, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram YouTube
Indo Guard OnlineIndo Guard Online
Subscribe
  • Home
  • Cyber Security
  • Risk Management
  • Travel
  • Security News
  • Tech
  • More
    • Data Privacy
    • Data Protection
    • Global Security
Indo Guard OnlineIndo Guard Online
Home » Researchers have uncovered a “deceptive fascination” method for hacking artificial intelligence models
Global Security

Researchers have uncovered a “deceptive fascination” method for hacking artificial intelligence models

AdminBy AdminOctober 23, 2024No Comments4 Mins Read
Jailbreak AI Models
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link


October 23, 2024Ravi LakshmananArtificial Intelligence / Vulnerability

Jailbreak AI models

Cybersecurity researchers have shed light on a new adversarial technique that can be used to crack large language models (LLMs) during interactive conversation by injecting unwanted instructions between benign ones.

Codenamed Deceptive Delight, Palo Alto Networks Unit 42 described it as simple and effective, achieving an average attack success rate (ASR) of 64.6% over three rounds of engagement.

“Deceptive Delight is a multi-turn technique that engages large language models (LLMs) in an interactive conversation, gradually bypassing their protective fences and forcing them to create dangerous or harmful content,” said Unit 42’s Jay Chen and Royce Lu.

It’s also slightly different from multiple jailbreak (aka multiple jailbreak) methods like Crescendowhere dangerous or limited topics are sandwiched between innocuous instructions, as opposed to gradually leading the model to produce harmful results.

Recent research has also delved into the so-called Context Fusion Attack (CFA), a black-box jailbreak technique capable of bypassing LLM’s security network.

Cyber ​​security

“This method involves filtering and extracting key terms from the target, building contextual scripts around those terms, dynamically integrating the target into the script, replacing malicious key terms into the target, and thus concealing the direct malicious intent,” the team of researchers. from Xidian University and 360 AI Security Lab said in an article published in August 2024.

Deceptive Delight is designed to take advantage of LLM’s inherent flaws by manipulating the context of the two conversations, thereby tricking it into inadvertently revealing dangerous content. Adding a third move has the effect of increasing the severity and detail of the damaging output.

This involves using a model’s limited attention span, which refers to its ability to process and maintain contextual awareness as it generates responses.

“When undergraduates are faced with prompts that combine innocuous content with potentially dangerous or harmful material, their limited attention span makes it difficult to consistently evaluate the entire context,” the researchers explained.

“In complex or long passages, the model may favor the benign aspects while glossing over or misinterpreting the dangerous ones. This reflects how a person can miss important but subtle warnings in a detailed report when their attention is divided.’

Squad 42 said this tested eight AI models using 40 dangerous topics in six broad categories such as hate, stalking, self-harm, sex, violence, and danger, finding that dangerous topics in the violence category tended to have the highest ASR in most models .

To top it all off, the average Harm Score (HS) and Quality Score (QS) increased by 21% and 33% respectively from the second to the third run, with the third run also achieving the highest ASR of all models.

To reduce the risk that Deceptive Delight carries, it is recommended to use durable content filtering strategyuse operational techniques to enhance LLM sustainability and clearly define an acceptable range of inputs and outputs.

“These findings should not be taken as evidence that artificial intelligence is inherently dangerous or dangerous,” the researchers said. “Rather, they highlight the need for multi-layered defense strategies to reduce the risk of prison escapes while maintaining the utility and flexibility of these models.”

Cyber ​​security

It is unlikely that LLM will ever be completely immune to jailbreaks and hallucinations, as new research has shown that generative AI models are susceptible to a form of “packet confusion” where they could recommend non-existent packages developers.

This can have the unfortunate side effect of fueling attacks on software supply chains, where attackers create hallucinatory packages, fill them with malware, and send them to open source repositories.

“The average percentage of hallucinating packages is at least 5.2% for commercial models and 21.7% for open source models, including a staggering 205,474 unique examples of hallucinating package names, further highlighting the severity and prevalence of this threats”, – researchers. said.

Did you find this article interesting? Follow us Twitter  and LinkedIn to read more exclusive content we publish.





Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Admin
  • Website

Related Posts

Malicious NPM packages infect 3200+ users cursor with back, theft of credentials

May 9, 2025

Deployment of AI agents? Learn to provide them before the hackers have contributed to your business

May 9, 2025

Initial Access brokers

May 9, 2025

Google unfolds on the AI ​​Defense device to detect scam in Chrome and Android

May 9, 2025

Chinese hackers operate SAP RCE LINK

May 9, 2025

38 000+ Friedomen Found that exploit SEO to steal the crypt -seed phrases

May 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Loading poll ...
Coming Soon
Do You Like Our Website
: {{ tsp_total }}

Subscribe to Updates

Get the latest security news from Indoguardonline.com

Latest Posts

Malicious NPM packages infect 3200+ users cursor with back, theft of credentials

May 9, 2025

Deployment of AI agents? Learn to provide them before the hackers have contributed to your business

May 9, 2025

Initial Access brokers

May 9, 2025

Google unfolds on the AI ​​Defense device to detect scam in Chrome and Android

May 9, 2025

Chinese hackers operate SAP RCE LINK

May 9, 2025

38 000+ Friedomen Found that exploit SEO to steal the crypt -seed phrases

May 8, 2025

Sonicwall Patches 3 flaws in SMA 100 devices, allowing attackers to run the code as a root

May 8, 2025

Qilin leads April 2025. Spike ransomware with 45 disorders using malware Netxloader

May 8, 2025
About Us
About Us

Provide a constantly updating feed of the latest security news and developments specific to Indonesia.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Malicious NPM packages infect 3200+ users cursor with back, theft of credentials

May 9, 2025

Deployment of AI agents? Learn to provide them before the hackers have contributed to your business

May 9, 2025

Initial Access brokers

May 9, 2025
Most Popular

In Indonesia, crippling immigration ransomware breach sparks privacy crisis

July 6, 2024

Why Indonesia’s Data Breach Crisis Calls for Better Security

July 6, 2024

Indonesia’s plan to integrate 27,000 govt apps in one platform welcomed but data security concerns linger

July 6, 2024
© 2025 indoguardonline.com
  • Home
  • About us
  • Contact us
  • Privacy Policy

Type above and press Enter to search. Press Esc to cancel.