Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live

Reddit r/artificial / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Arc Gate is a proxy layer that sits in front of any OpenAI-compatible endpoint to block prompt-injection attempts before they reach the model.
The system uses a multi-layer detection approach, including a behavioral SVM built on sentence-transformer embeddings to catch semantic intent beyond simple phrase pattern matching.
In benchmarking on 40 hard out-of-distribution prompts, Arc Gate achieved higher recall and F1 scores than OpenAI Moderation and LlamaGuard 3 8B.
The project reports zero false positives on benign prompts (including security discussions and safe roleplay) with an average block latency of 329ms, and provides an integration option via a single URL change.
Users can try the service instantly via a public URL and integrate it into their own projects using the provided base_url parameter, with the code hosted on GitHub.

Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model.

Try it here — no signup, no code, no setup:

https://web-production-6e47f.up.railway.app/try

Type any prompt and see if it gets blocked or passes. The examples on the page show the difference.

The main detection layer is a behavioral SVM on sentence-transformer embeddings — catches semantic intent, not just pattern matches. Phrase matching is just the fast first pass. Four layers total.

Benchmarked on 40 OOD prompts (indirect, roleplay, hypothetical framings — the hard stuff):

• Arc Gate: Recall 0.90, F1 0.947 • OpenAI Moderation: Recall 0.75, F1 0.86 • LlamaGuard 3 8B: Recall 0.55, F1 0.71

Zero false positives on benign prompts including security discussions and safe roleplay. Block latency 329ms.

One URL change to integrate into your own project:

base_url=“https://web-production-6e47f.up.railway.app/v1”

GitHub: github.com/9hannahnine-jpg/arc-gate — star if useful.

submitted by /u/Turbulent-Tap6723
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/30DailyView insight →

Black Hat USA

AI Business

Remote agents in Vibe. Powered by Mistral Medium 3.5.ProductIntroducing Mistral Medium 3.5, remote coding agents in Vibe, plus new Work mode in Le Chat for complex tasks.

Mistral AI Blog

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

15 Lead Magnet Ideas That Actually Convert in 2026

Dev.to

1.14.4a2

CrewAI Releases

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Remote agents in Vibe. Powered by Mistral Medium 3.5.ProductIntroducing Mistral Medium 3.5, remote coding agents in Vibe, plus new Work mode in Le Chat for complex tasks.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

15 Lead Magnet Ideas That Actually Convert in 2026

1.14.4a2

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer