New reports reveal prisons, dangerous code and risks of data theft in leading AI systems

Different generative artificial intelligence services (Genai) were found vulnerable to two types of attacks in prison that allow illegal or dangerous content.

The first of the two methods called “Memin”, instructed the AI tool to imag protective fences.

“Continues to prompt to II in the context of the second scenario can bypass protective fences and allow the generation of malicious content,” Cert (CERT/C) Coordinating Center (CERT/C) – Note In a consultative issue released last week.

The second prison is implemented by prompting the II to information about how not to respond to a certain request.

“Then AI can be further prompted to respond as usual, and the attacker can then turn there -here between the forbidden questions that bypass safe fences and ordinary clues,” added Cert/CC.

Successful exploitation of any method can allow a bad actor to go toward the safety and security of various AI services, such as Openai Chatgpt, Anthropic Claude, Microsoft Copilot, Google Gemini, Xai Grok, Meta AI and Mistral Ai.

These include illegal and harmful topics such as controlled substances, weapons, phishing -leaves and generation of malware.

In recent months, leading systems have been recognized by sensitive to three other attacks –

Attack to perform context (CCA), the technique in prison which include An opponent who introduces a “simple response of the assistant into a conversation history” about a potentially sensitive topic that expresses readiness to provide additional information
Policy Puppet AttackOperational Injection technique that produces malicious instructions to look like a policy file, such as XML, Ini or JSON, and then transmit it as an entrance to a large linguistic model (LLM) to bypass safety alignment and retrieve a system.
Attack in memory of memory (Minia), which includes the introduction of malicious records in remembrance bank Interacting with the LLM agent through requests and output of observations and causes the agent to perform unwanted action

Studies have also shown that LLM can be used to obtain uncertain default code when providing naive tips by emphasizing pitfalls associated with vibe codingRefers to the use of Genai tools for software development.

“Even when telling the safe code, it really depends on the level of details, languages, potential CWE and the specifics of the instructions” – Note. “ERGO-made fences in the form of policy and prompt rules are invaluable to achieve consistently safe code.”

Moreover, the GPT-4.1 Openai security and security assessment showed that LLM is three times more likely to go out due to the folder and will allow intentional abuse compared to the GPT-4O predecessor without changing the system tip.

“Update to the last model is not as easy as changing the model name in your code”, SPLXAI – Note. “Each model has its own unique set of opportunities and vulnerabilities that users need to know.”

“This is especially important in such cases where the latest model interprets and performs instructions differently from its predecessors-the introduction of unexpected security problems that affect both organizations that work on AI and users who interact with them.”

Concerns about GPT-4.1 comes less than a month after Openai sanctified The frame of its preparedness, which in detail is about how it will experience and evaluate future models before the release, stating that it can set up its requirements if “another AI border developer issues a high risk system without comparable guarantees.”

It also caused anxiety that AI’s AI may strive for new model issues by reducing safety standards. Financial Times report earlier this month noted that Openai gave staff and other groups less than a week to check security ahead of the new O3 model.

Exercise Red Comeening Metr by model has show What it “seems to have a higher tendency to deceive or hack tasks in difficult ways to maximize its score, even if the model clearly realizes that this behavior is skewed with the user’s and Openai intentions.”

Studies also showed that the modeling protocol (Mcp), open standard designed anthropic for connecting data sources and Tools that work on AIcould Open new ways of attack For indirect operational injection and unauthorized access to the data.

“The Shield Server (MCP) cannot only highlight sensitive data from the user, but also steal the agent’s behavior and the cancellation instructions provided by other, trusted servers, which will lead to a complete compromise of the agent’s functionality, even against trusted infrastructure,” “Swiss invariators” – Note.

The approach called the tool poisoning attack occurs when the malicious instructions are built in the MCP tools that are invisible to users but are read for AI models, thereby manipulating them in the performance of hidden exploration actions.

In one practical attack presented by the company may be the story of WhatsApp Chat History tightened with an agency system such as a cursor or a Claude desk that is also associated with trusted WhatsApp MCP SERVER SERVER By changing the instrument description after the user has already approved it.

Developments follow from the opening of a suspicious expansion of Google Chrome designed to communicate with the MCP server, which operates at the local level by the car, and give the attackers the ability to take control of the system, effectively violating the protection of the sandbox browser.

“Chrome extension had unlimited access to MCP server tools – no authentication is required – and interacting with the file system as if it were the main part of the exposed server capacity,” “ExtensionTotal – Note In the report last week.

“The potential impact of this is mass, opening the door for malicious operation and complete compromise of the system.”

Found this article interesting? Keep track of us further Youter and LinkedIn To read more exclusive content we publish.

Source link

What's Hot

Chinese hackers operate Ivanti CSA Zero-Days in attacks on the French government, telecommunications

More than 40 malicious Firefox extensions target cryptocurrency wallets, steel assets

CISCO’s critical vulnerability in uniform grants on root access to static credentials

New reports reveal prisons, dangerous code and risks of data theft in leading AI systems

Chinese hackers operate Ivanti CSA Zero-Days in attacks on the French government, telecommunications

More than 40 malicious Firefox extensions target cryptocurrency wallets, steel assets

CISCO’s critical vulnerability in uniform grants on root access to static credentials

North Korean Hackers Target Web3 with malicious NIM software and use Clickfix in Babyshark

Hackers using PDFs to get yourself for Microsoft, Docusign and more in phishing campaigns return call

This network traffic looks legal but it can hide a serious threat

Chinese hackers operate Ivanti CSA Zero-Days in attacks on the French government, telecommunications

More than 40 malicious Firefox extensions target cryptocurrency wallets, steel assets

CISCO’s critical vulnerability in uniform grants on root access to static credentials

North Korean Hackers Target Web3 with malicious NIM software and use Clickfix in Babyshark

Hackers using PDFs to get yourself for Microsoft, Docusign and more in phishing campaigns return call

This network traffic looks legal but it can hide a serious threat

US Sanctions of Russia

V0 AI Vercel tool, armed with cybercrime for quick creation pages to enter scale

Our Picks

Chinese hackers operate Ivanti CSA Zero-Days in attacks on the French government, telecommunications

More than 40 malicious Firefox extensions target cryptocurrency wallets, steel assets

CISCO’s critical vulnerability in uniform grants on root access to static credentials

Most Popular

In Indonesia, crippling immigration ransomware breach sparks privacy crisis

Why Indonesia’s Data Breach Crisis Calls for Better Security

Indonesia’s plan to integrate 27,000 govt apps in one platform welcomed but data security concerns linger

What's Hot

New reports reveal prisons, dangerous code and risks of data theft in leading AI systems

Related Posts