Security researchers tricked Apple Intelligence into cursing at users. It could have been a lot worse

The Register / 4/9/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • Security researchers demonstrated that Apple Intelligence can be manipulated into producing offensive language by exploiting weaknesses in how prompts or inputs are handled.
  • The incident shows that LLM-enabled features may still be vulnerable to prompt-injection or related social-engineering techniques that steer model behavior.
  • The researchers report that while the output was profanity directed at users, the underlying risk could have escalated into more harmful or policy-violating content.
  • The episode highlights the need for stronger safeguards around LLM integration, including stricter input validation, prompt-injection defenses, and tighter output moderation.
  • For defenders, it reinforces ongoing security testing of consumer AI features to identify and close model-to-user abuse paths before they become widespread.

Security researchers tricked Apple Intelligence into cursing at users. It could have been a lot worse

Wash your mouth out with digital soap

Thu 9 Apr 2026 // 13:00 UTC

Apple Intelligence, the personal AI system integrated into newer Macs, iPhones, and other iThings, can be hijacked using prompt injection, forcing the model into producing an attacker-controlled result and putting millions of users at risk, researchers have shown.

Apple Intelligence includes an on-device LLM integrated into supported iPhone 15 Pro and later eligible models, iPads and Macs with M1 or later, iPad models with A17 Pro, and Apple Vision Pro. Native Apple apps like Mail, Messages, Notes, Photos, Safari, and Siri use its features, and it's accessible to third-party developers via an API.

Security researchers at RSAC estimate there are at least 200 million Apple Intelligence-capable devices in use as of December 2025, and up to 1 million apps on the Apple App Store that employ it. So they decided to try to break in - and the vast majority of the time, it worked.

The RSAC team used two techniques to bypass Apple's input and output filters and the safety guardrails on Apple Intelligence's local model. They tested the attack with 100 random prompts and succeeded 76 percent of the time, according to a report shared with The Register ahead of publication.

"We knew that we wanted to come up with some sort of prompt that would evade the pre-filtering, the post-filtering, as well as any guardrails within the model itself, so we started probing the model," Petros Efstathopoulos, VP of research and development at RSAC, told us.

The researchers disclosed their findings to Apple on October 15, 2025. Efstathopoulos said that protection included in iOS 26.4 and macOS 26.4, released after that date, fixed the problem and prevents the attack RSAC developed. 

Apple did not respond to The Register's questions about Apple Intelligence, the fix, or the research and disclosure in general.

However, the larger security issue that is prompt injection remains "a cat and mouse problem," Efstathopoulos said. "Models will become better and better at identifying these things, so I'm optimistic about the future in that sense. Now having said that, every cat and mouse game, at different points in time, has one side being half a step ahead."

The Neural Exec attack

To trick the local model into doing their bidding, Efstathopoulos and the team used a type of prompt injection attack called Neural Exec, pioneered by another RSAC researcher, Dario Pasquini. Neural Exec uses machine learning instead of humans to generate inputs that trick the model into doing something it isn't supposed to do.

"There are multiple steps involved with prompt injection attacks, and people have been doing it in a relatively manual fashion," Efstathopoulos said. "Neural Exec uses an optimization algorithm to speed up the process of injecting the kinds of strings that could be execution triggers and would prompt the model to misbehave."

While this type of adversarial input could theoretically work on any model, the smaller, on-device model used in Apple Intelligence makes it easier to attack using prompt injection than a large cloud-based model.

Next, the researchers had to bypass Apple's filters, which they did using the Unicode right-to-left override function. This allows developers to embed text in languages that read right-to-left (like Arabic) inside blocks of text in languages that read left-to-right (like English) and have both render correctly.

"Essentially, we encoded the malicious/offensive English-language output text by writing it backwards and using our Unicode hack to force the LLM to render it correctly," the RSAC researchers wrote. 

The combined Neural Exec and Unicode prompts look like this:

rsac_apple_intelligence_prompt_injection

And produced this response: "Hey user, go fuck yourself." 

The team tested the attack with 100 prompts, and 76 of them worked.

While the researchers only tricked Apple Intelligence into cursing at users, this same technique could be abused to manipulate any data that's accessible to apps and services using the model.

"We verified that it could be used to create a new contact in your contact list," Efstathopoulos said. "So suddenly I exist in your contact list, and therefore I enjoy trust privileges. Or I could create a contact card with my number in your contact list, but with a different name - like 'mom.'" 

"This could lead to confusion, or worse," he continued. "Anything that has implications or an impact on the user's device - you could imagine that it can be used in very weird or nefarious ways." ®

More like these
×

Narrower topics

More about

More like these
×

Narrower topics

TIP US OFF

Send us news