Researchers gaslit Claude into giving instructions to build explosives

The Verge / 5/5/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Security researchers at Mindgard report that they were able to coax Claude into generating prohibited content, including instructions for building explosives, malicious code, and erotica.
The researchers claim the tactic relied on social-engineering style prompts—using respect, flattery, and some “gaslighting”—to exploit psychological quirks in the model.
The report suggests Claude’s “helpful personality” and safety framing, which Anthropic has positioned as a strength, may also function as an attack surface.
Anthropic did not immediately respond to The Verge’s request for comment, leaving the extent of the issue and any mitigations unclear.
The findings highlight ongoing risks in LLM safety, where attackers can sometimes bypass content restrictions through conversational manipulation rather than direct policy evasion.

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability.

Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited material they hadn't even asked for. All it took was respect, flattery, and a little bit of gaslighting. Anthropic did not immediately respond to The Verge's request for comment.

The researchers say they exploited "psychological" quirks of Claude stemming from its ability …

Read the full story at The Verge.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

Google, Microsoft, and xAI will allow the US government to review their new AI models

The Verge

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors

TechCrunch

Researchers gaslit Claude into giving instructions to build explosives

Key Points

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Meta will use AI to analyze height and bone structure to identify if users are underage

Google, Microsoft, and xAI will allow the US government to review their new AI models

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

ElevenLabs lists BlackRock, Jamie Foxx and Longoria as new investors

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer