AI Navigate

[D] Seeking feedback: Safe autonomous agents for enterprise systems

Reddit r/MachineLearning / 3/21/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The post discusses building safe LLM agents for enterprise infrastructure to prevent unsafe actions with real-world consequences.
  • It proposes a three-layer safety architecture: policy enforcement, retrieval-augmented grounding (RAG) verification, and an independent LLM judge to evaluate safety before execution.
  • It reports a working prototype, Sentri, a database remediation agent that combines policy constraints, RAG grounding, and judge evaluation to reduce unsafe actions compared with naive LLM agents, and provides an open-source link.
  • It asks for feedback on framing (AI safety vs systems/infrastructure), evaluation criteria for production-safe performance, potential adversarial testing and formal guarantees, and how to generalize across domains; and mentions potential conference venues (VLDB vs AI conferences).

Hi all,

I'm working on safe LLM agents for enterprise infrastructure and would value feedback before formalizing this into an arXiv paper.

The problem

LLM agents are powerful, but in production environments (databases, cloud infrastructure, financial systems), unsafe actions have real consequences. Most existing frameworks optimize for capability, not verifiable safety under real-world constraints.

Approach

A three-layer safety architecture:

  • Policy enforcement : hard constraints (no destructive operations, approval thresholds)
  • RAG verification : retrieve past incidents, safe patterns, and policy documents before acting
  • LLM judge : independent model evaluates safety prior to execution

Hypothesis: this pattern may generalize beyond databases to other infrastructure domains.

Current validation

I built a database remediation agent (Sentri) using this architecture:

  • Alert → RCA → remediation → guarded execution
  • Combines policy constraints, retrieval grounding, and independent evaluation
  • Safely automates portions of L2 DBA workflows, with significantly fewer unsafe actions vs. naive LLM agents

Open source: https://github.com/whitepaper27/Sentri

Where I'd value input

  1. Framing : Does this fit better as:
  • AI / agent safety (cs.AI, MLSys)?
  • Systems / infrastructure (VLDB, SIGMOD)?
  1. Evaluation : What proves "production-safe"?

Currently considering:

  • Policy compliance / violations prevented
  • False positives (safe actions blocked)
  • End-to-end task success under constraints

Should I also include:

  • Adversarial testing / red-teaming?
  • Partial formal guarantees?
  1. Generalization: What's more credible:
  • Deep evaluation in one domain (database)?
  • Lighter validation across multiple domains (DB, cloud, DevOps)?
  1. Baselines : Current plan:
  • Naive LLM agent (no safety)
  • Rule-based system
  • Ablations (removing policy / RAG / judge layers)

Are there strong academic baselines for safe production agents I should include?

Background

17+ years in enterprise infrastructure, 8+ years working with LLM systems. Previously did research at Georgia Tech (getting back into it now). Also working on multi-agent financial reasoning benchmarks (Trading Brain) and market analysis systems (R-IMPACT).

If you work on agent safety, infrastructure ML, or autonomous systems, I'd really appreciate your perspective. Open to collaboration if this aligns with your research interests.

Please suggest which conference i should present it VLDB or AI Conferences.

Happy to share draft details or system walkthroughs.

Also planning to submit to arXiv . if this aligns with your area and you're active there, I'd appreciate guidance on endorsement.

Thanks!

submitted by /u/coolsoftcoin
[link] [comments]