Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model.
Try it here — no signup, no code, no setup:
https://web-production-6e47f.up.railway.app/try
Type any prompt and see if it gets blocked or passes. The examples on the page show the difference.
The main detection layer is a behavioral SVM on sentence-transformer embeddings — catches semantic intent, not just pattern matches. Phrase matching is just the fast first pass. Four layers total.
Benchmarked on 40 OOD prompts (indirect, roleplay, hypothetical framings — the hard stuff):
• Arc Gate: Recall 0.90, F1 0.947 • OpenAI Moderation: Recall 0.75, F1 0.86 • LlamaGuard 3 8B: Recall 0.55, F1 0.71 Zero false positives on benign prompts including security discussions and safe roleplay. Block latency 329ms.
One URL change to integrate into your own project:
base_url=“https://web-production-6e47f.up.railway.app/v1”
GitHub: github.com/9hannahnine-jpg/arc-gate — star if useful.
[link] [comments]



