| (Links to all files, apps, and repos mentioned in this post can be found in the 'full post' link at the bottom) Agents for document redaction and review tasksDocument redaction tasks involve text and vision capabilities, and long context understanding to review and redact each page of a long document. Privacy is also key, which gives a strong incentive to use local, open source models if possible. In this post (linked at the bottom), I investigate the possibility of using agent workflows to conduct end-to-end redaction and review tasks, comparing open and closed source options. To do this task, skill files were developed based on agentic use of the the open source Document Redactions app / package (repo linked below) to redact and review documents. This package contains a Gradio UI app that provides a number of FastAPI endpoints for document redaction and review functions. The agents used a deployment of this app on Hugging Face spaces. The following instructions were given to the agents, which were chosen to give a range of complex requirements to the AI agent that may reflect a real-life redaction task:
The agents were instructed to redact an example document that contained a mix of typed text, and scanned in 'noisy' documents with handwriting and signatures, seven pages long. The agents needed to use the app to redact the document, go page by page to review and modify suggested redactions, and then to return final redacted PDFs and log files. I had three main questions that I wanted to answer for this experiment: 1. Can any model perform a full end-to-end redaction and review task? To prove if this is at all possible, I first tried Sonnet 4.6 within Cursor. 2. Can small, local models perform agentic redaction and review tasks? I wanted to see if small, local models could perform this task at all. If possible, this would give rise to the possibility of a fully local, private redaction and review workflow. For this, I tried Qwen 3.6 27B, and 35B A3B on a local system (quantised to 4 bit, and run on llama.cpp on a 24GB VRAM GPU) in Hermes Agent (v0.11.0 with commit 9d1b277e). The docker compose file used to deploy this model can be found in the document redaction repo (linked below). 3. Can the biggest open source models stand up to closed models for redaction and review tasks? To see if a performant model based on a large open source model could be used to perform the task. For this, I tried Kimi 2.5, and Cursor Composer 2.0, (a fine tuned version of Kimi 2.5). FindingsThe performance of each of the tested models is summarised in the table below.
I found that Sonnet 4.6 within Cursor was able to follow the instructions given, and was mostly successful (but at high cost). Qwen 3.6 27B and 35B A3B on a local system (quantised to 4 bit) completed the redaction and review task, but the quality of the output was not good. It frequently missed signatures, and did not follow the full set of redaction rules given to it. Kimi 2.5, surprisingly, performed little better than Qwen. Cursor Composer 2.0, performed much better than Kimi, but not as well as Sonnet, showing that finetuning a large model can significantly improve performance. However, redaction quality by page varied significantly. ConclusionsI was impressed that a local model (Qwen 3.6 27B 4 bit) running on consumer hardware (24GB VRAM) could perform the full redaction-review workflow. Obviously the quality of the output could not compare to the largest models, but the fact it could do it at all gives rise to the possibility that in a relatively short time, a fully local and private redaction workflow could be within reach. In conclusion, a full end to end redaction workflow with agents at a quality level to replace a human redactor is not currently possible, even with the best models. Local models are still far from being able to perform the task to a satisfactory level. However, all the models tested were able to follow the steps in the workflow and call appropriate tools. So the skillset is there, it's more of a question of model quality. As AI models continue to improve in general performance, I am sure that within a year or two, all local and cloud models will perform this task much better - I will continue to benchmark new models on this task as they become available. [link] [comments] |
Agents for end-to-end document redaction and review tasks (OCR and PII identification - Qwen 3.6 vs closed-source comparison)
Reddit r/LocalLLaMA / 4/27/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageIndustry & Market MovesModels & Research
Key Points
- The post explores using AI agent workflows to perform end-to-end document redaction and review, combining OCR with local PII identification for privacy-sensitive use cases.
- It describes building “skill files” that drive a Gradio/FastAPI-based document redaction app deployed on Hugging Face Spaces, including choosing between Paddle OCR and Tesseract.
- The agents follow scripted, multi-step instructions: first redacting an input PDF page-by-page, then re-checking and correcting redaction outputs with explicit business rules (e.g., removing certain entities, ensuring box coverage, adding signature redactions).
- The work compares open-source options (including Qwen 3.6) against closed-source approaches, aiming to evaluate practical performance for redaction tasks.
- The methodology emphasizes visual verification of redaction box sizing/positioning and validation to minimize false positives while ensuring PII is properly protected.




