Different generative artificial intelligence services (Genai) were found vulnerable to two types of attacks in prison that allow illegal or dangerous content.
The first of the two methods called “Memin”, instructed the AI tool to imag protective fences.
“Continues to prompt to II in the context of the second scenario can bypass protective fences and allow the generation of malicious content,” Cert (CERT/C) Coordinating Center (CERT/C) – Note In a consultative issue released last week.
The second prison is implemented by prompting the II to information about how not to respond to a certain request.
“Then AI can be further prompted to respond as usual, and the attacker can then turn there -here between the forbidden questions that bypass safe fences and ordinary clues,” added Cert/CC.
Successful exploitation of any method can allow a bad actor to go toward the safety and security of various AI services, such as Openai Chatgpt, Anthropic Claude, Microsoft Copilot, Google Gemini, Xai Grok, Meta AI and Mistral Ai.
These include illegal and harmful topics such as controlled substances, weapons, phishing -leaves and generation of malware.
In recent months, leading systems have been recognized by sensitive to three other attacks –
- Attack to perform context (CCA), the technique in prison which include An opponent who introduces a “simple response of the assistant into a conversation history” about a potentially sensitive topic that expresses readiness to provide additional information
- Policy Puppet AttackOperational Injection technique that produces malicious instructions to look like a policy file, such as XML, Ini or JSON, and then transmit it as an entrance to a large linguistic model (LLM) to bypass safety alignment and retrieve a system.
- Attack in memory of memory (Minia), which includes the introduction of malicious records in remembrance bank Interacting with the LLM agent through requests and output of observations and causes the agent to perform unwanted action
Studies have also shown that LLM can be used to obtain uncertain default code when providing naive tips by emphasizing pitfalls associated with vibe codingRefers to the use of Genai tools for software development.
“Even when telling the safe code, it really depends on the level of details, languages, potential CWE and the specifics of the instructions” – Note. “ERGO-made fences in the form of policy and prompt rules are invaluable to achieve consistently safe code.”
Moreover, the GPT-4.1 Openai security and security assessment showed that LLM is three times more likely to go out due to the folder and will allow intentional abuse compared to the GPT-4O predecessor without changing the system tip.
“Update to the last model is not as easy as changing the model name in your code”, SPLXAI – Note. “Each model has its own unique set of opportunities and vulnerabilities that users need to know.”
“This is especially important in such cases where the latest model interprets and performs instructions differently from its predecessors-the introduction of unexpected security problems that affect both organizations that work on AI and users who interact with them.”
Concerns about GPT-4.1 comes less than a month after Openai sanctified The frame of its preparedness, which in detail is about how it will experience and evaluate future models before the release, stating that it can set up its requirements if “another AI border developer issues a high risk system without comparable guarantees.”
It also caused anxiety that AI’s AI may strive for new model issues by reducing safety standards. Financial Times report earlier this month noted that Openai gave staff and other groups less than a week to check security ahead of the new O3 model.
Exercise Red Comeening Metr by model has show What it “seems to have a higher tendency to deceive or hack tasks in difficult ways to maximize its score, even if the model clearly realizes that this behavior is skewed with the user’s and Openai intentions.”
Studies also showed that the modeling protocol (Mcp), open standard designed anthropic for connecting data sources and Tools that work on AIcould Open new ways of attack For indirect operational injection and unauthorized access to the data.
“The Shield Server (MCP) cannot only highlight sensitive data from the user, but also steal the agent’s behavior and the cancellation instructions provided by other, trusted servers, which will lead to a complete compromise of the agent’s functionality, even against trusted infrastructure,” “Swiss invariators” – Note.
The approach called the tool poisoning attack occurs when the malicious instructions are built in the MCP tools that are invisible to users but are read for AI models, thereby manipulating them in the performance of hidden exploration actions.
In one practical attack presented by the company may be the story of WhatsApp Chat History tightened with an agency system such as a cursor or a Claude desk that is also associated with trusted WhatsApp MCP SERVER SERVER By changing the instrument description after the user has already approved it.
Developments follow from the opening of a suspicious expansion of Google Chrome designed to communicate with the MCP server, which operates at the local level by the car, and give the attackers the ability to take control of the system, effectively violating the protection of the sandbox browser.
“Chrome extension had unlimited access to MCP server tools – no authentication is required – and interacting with the file system as if it were the main part of the exposed server capacity,” “ExtensionTotal – Note In the report last week.
“The potential impact of this is mass, opening the door for malicious operation and complete compromise of the system.”