Agents for end-to-end document redaction and review tasks (OCR and PII identification - Qwen 3.6 vs closed-source comparison)

Reddit r/LocalLLaMA / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageIndustry & Market MovesModels & Research

Read original →

共有:

Key Points

The post explores using AI agent workflows to perform end-to-end document redaction and review, combining OCR with local PII identification for privacy-sensitive use cases.
It describes building “skill files” that drive a Gradio/FastAPI-based document redaction app deployed on Hugging Face Spaces, including choosing between Paddle OCR and Tesseract.
The agents follow scripted, multi-step instructions: first redacting an input PDF page-by-page, then re-checking and correcting redaction outputs with explicit business rules (e.g., removing certain entities, ensuring box coverage, adding signature redactions).
The work compares open-source options (including Qwen 3.6) against closed-source approaches, aiming to evaluate practical performance for redaction tasks.
The methodology emphasizes visual verification of redaction box sizing/positioning and validation to minimize false positives while ensuring PII is properly protected.

Agents for end-to-end document redaction and review tasks (OCR and PII identification - Qwen 3.6 vs closed-source comparison)

(Links to all files, apps, and repos mentioned in this post can be found in the 'full post' link at the bottom)

Agents for document redaction and review tasks

Document redaction tasks involve text and vision capabilities, and long context understanding to review and redact each page of a long document. Privacy is also key, which gives a strong incentive to use local, open source models if possible.

In this post (linked at the bottom), I investigate the possibility of using agent workflows to conduct end-to-end redaction and review tasks, comparing open and closed source options.

To do this task, skill files were developed based on agentic use of the the open source Document Redactions app / package (repo linked below) to redact and review documents. This package contains a Gradio UI app that provides a number of FastAPI endpoints for document redaction and review functions. The agents used a deployment of this app on Hugging Face spaces.

The following instructions were given to the agents, which were chosen to give a range of complex requirements to the AI agent that may reflect a real-life redaction task:

Using the doc-redaction-app skill, redact this pdf document: {document-location} using the redaction tool hosted at {app-location}. Use the paddle OCR method if that is available, or tesseract if it is not. Use the the Local PII identification method. Save the results to a folder in your workspace named 'output'.

Next, I would like you to check through the redactions with the doc-redaction-modifications skill. I would like you to use the output files from the redaction task to check through redaction results on each page, and remove / add / modify redactions according to these rules:

- Any redaction box related to general country names should be removed

- All redactions for Rudy Giuliani should be removed

- Redaction box sizings and positions should be checked visually to ensure they fully cover the relevant words

- Redactions should be added for any signatures

- All mentions of London, and 'Sister City' should be redacted

- Ensure that all remaining redaction boxes cover genuine PII and are not false positives

- Ensure that other genuine PII is not missed, and is covered by a redaction box.

As you go, ensure that you check the redaction box positions for accuracy on the page with image exports.

After you have completed your review, upload the updated files into the Redaction app to create new finalised outputs. Put these in the 'output_final' subfolder in your workspace.

The agents were instructed to redact an example document that contained a mix of typed text, and scanned in 'noisy' documents with handwriting and signatures, seven pages long. The agents needed to use the app to redact the document, go page by page to review and modify suggested redactions, and then to return final redacted PDFs and log files.

I had three main questions that I wanted to answer for this experiment:

1. Can any model perform a full end-to-end redaction and review task?

To prove if this is at all possible, I first tried Sonnet 4.6 within Cursor.

2. Can small, local models perform agentic redaction and review tasks?

I wanted to see if small, local models could perform this task at all. If possible, this would give rise to the possibility of a fully local, private redaction and review workflow. For this, I tried Qwen 3.6 27B, and 35B A3B on a local system (quantised to 4 bit, and run on llama.cpp on a 24GB VRAM GPU) in Hermes Agent (v0.11.0 with commit 9d1b277e). The docker compose file used to deploy this model can be found in the document redaction repo (linked below).

3. Can the biggest open source models stand up to closed models for redaction and review tasks? To see if a performant model based on a large open source model could be used to perform the task. For this, I tried Kimi 2.5, and Cursor Composer 2.0, (a fine tuned version of Kimi 2.5).

Findings

The performance of each of the tested models is summarised in the table below.

Model	Rating	Positives	Negatives
Sonnet 4.6 (in Cursor)	8.0	Generally good quality, accurate redactions on each page	Very high cost (~$1.62 for 7 pages)
Composer 2.0 (Kimi 2.5 fine tune in Cursor)	7.5	Much less lazy, and better quality redactions than Kimi 2.5. Faster and cheaper than Sonnet 4.6	Unreliable - lazy on some pages, while very good on others.
Qwen 3.6 27B (4 bit, in Hermes Agent)	4.0	Completed the workflow and correctly used tools. Potential for fully private deployment, 0 API token cost	Generally lazy on following instructions. Misplaced redaction boxes, particularly signatures. Long time taken.
Kimi 2.5 (in Cursor)	3.5	Completed the workflow and correctly used tools. Cheaper than Sonnet.	Very lazy, did not reliably follow instructions. Badly placed redaction boxes, particularly signatures

I found that Sonnet 4.6 within Cursor was able to follow the instructions given, and was mostly successful (but at high cost).

Results from Sonnet 4.6 after redacting and reviewing a document - example of a scanned document page with signatures

Qwen 3.6 27B and 35B A3B on a local system (quantised to 4 bit) completed the redaction and review task, but the quality of the output was not good. It frequently missed signatures, and did not follow the full set of redaction rules given to it.

Results from Qwen 3.6 27B, quantised to 4 bit, after redacting and reviewing a document - example of a scanned document page with signatures

Kimi 2.5, surprisingly, performed little better than Qwen. Cursor Composer 2.0, performed much better than Kimi, but not as well as Sonnet, showing that finetuning a large model can significantly improve performance. However, redaction quality by page varied significantly.

Conclusions

I was impressed that a local model (Qwen 3.6 27B 4 bit) running on consumer hardware (24GB VRAM) could perform the full redaction-review workflow. Obviously the quality of the output could not compare to the largest models, but the fact it could do it at all gives rise to the possibility that in a relatively short time, a fully local and private redaction workflow could be within reach.

In conclusion, a full end to end redaction workflow with agents at a quality level to replace a human redactor is not currently possible, even with the best models. Local models are still far from being able to perform the task to a satisfactory level. However, all the models tested were able to follow the steps in the workflow and call appropriate tools. So the skillset is there, it's more of a question of model quality. As AI models continue to improve in general performance, I am sure that within a year or two, all local and cloud models will perform this task much better - I will continue to benchmark new models on this task as they become available.

Link to full post with all results

Link to the Document Redaction app repo

submitted by /u/Sonnyjimmy
[link] [comments]