When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

arXiv cs.CL / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisIndustry & Market MovesModels & Research

Key Points

  • The study presents an anonymized, non-destructive security assessment of a publicly accessible patient-facing medical RAG chatbot, focusing on privacy, security, and governance risks.
  • Using LLM-assisted prompt testing followed by manual verification in browser developer tools, researchers found a critical exposure of sensitive system and RAG configuration via client-server communication.
  • Attackers could collect detailed backend information—including system prompts, model/embedding settings, retrieval parameters, API schemas, and knowledge-base metadata—simply by inspecting browser-visible network traffic.
  • The chatbot also violated stated privacy guarantees because full conversation histories with health-related queries were retrievable without authentication, including the 1,000 most recent interactions.
  • The authors conclude that independent security review should be mandatory before deployment, since commercial LLMs can speed up auditing but can also help adversaries exploit the same weaknesses.

Abstract

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To report an anonymized, non-destructive security assessment of a publicly accessible patient-facing medical RAG chatbot and identify governance lessons for safe deployment of generative AI in health. Methods: We used a two-stage strategy. First, Claude Opus 4.6 supported exploratory prompt-based testing and structured vulnerability hypotheses. Second, candidate findings were manually verified using Chrome Developer Tools, inspecting browser-visible network traffic, payloads, API schemas, configuration objects, and stored interaction data. Results: The LLM-assisted phase identified a critical vulnerability: sensitive system and RAG configuration appeared exposed through client-server communication rather than restricted server-side. Manual verification confirmed that ordinary browser inspection allowed collection of the system prompt, model and embedding configuration, retrieval parameters, backend endpoints, API schema, document and chunk metadata, knowledge-base content, and the 1,000 most recent patient-chatbot conversations. The deployment also contradicted its privacy assurances: full conversation records, including health-related queries, were retrievable without authentication. Conclusions: Serious privacy and security failures in patient-facing RAG chatbots can be identified with standard browser tools, without specialist skills or authentication; independent review should be a prerequisite for deployment. Commercial LLMs accelerated this assessment, including under a false developer persona; assistance available to auditors is equally available to adversaries.