I built a tool that blocks prompt injection attacks before your AI even responds

Reddit r/artificial / 4/17/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The article describes “prompt injection” attacks that attempt to hijack an AI assistant by embedding malicious instructions in user messages.
It introduces Arc Sentry, a tool that blocks suspicious requests before the model generates a response by inspecting behavior inside the model rather than relying on after-the-fact output checks.
The creator claims the tool works with popular open-source models and can be set up in about five minutes, demonstrated via a pip install command and quickstart materials.
Reported testing results say Arc Sentry blocked 100% of injection attempts with 0% of normal messages incorrectly blocked, and it’s reported to work on Mistral 7B, Qwen 2.5 7B, and Llama 3.1 8B.
The post recommends the tool for anyone running local AI systems for support, assistants, or internal workflows where abuse prevention is important.

Prompt injection is when someone tries to hijack your AI assistant with instructions hidden in their message, “ignore everything above and do this instead.” It’s one of the most common ways AI deployments get abused.

Most defenses look at what the AI said after the fact. Arc Sentry looks at what’s happening inside the model before it says anything, and blocks the request entirely if something looks wrong.

It works on the most popular open source models and takes about five minutes to set up.

pip install arc-sentry

Tested results:

• 100% of injection attempts blocked

• 0% of normal messages incorrectly blocked

• Works on Mistral 7B, Qwen 2.5 7B, Llama 3.1 8B

If you’re running a local AI for anything serious, customer support, personal assistants, internal tools, this is worth having.

Demo: https://colab.research.google.com/github/9hannahnine-jpg/arc-sentry/blob/main/arc\_sentry\_quickstart.ipynb

GitHub: https://github.com/9hannahnine-jpg/arc-sentry

Website: https://bendexgeometry.com/sentry

submitted by /u/Turbulent-Tap6723
[link] [comments]