local natural language based video blurring/anonymization tool runs on 4K at 76 fps

Reddit r/LocalLLaMA / 4/2/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The article benchmarks a locally running, natural-language-driven video anonymization tool and reports that one configuration (RF-DETR Nano Det with a skip=4 setting) can reach 76 fps at 4K.
It finds a clear speed-versus-flexibility tradeoff: text-prompted grounding models like Grounding DINO and Florence-2 run at about ~2 fps but allow users to describe exactly what to blur without retraining.
The system combines zero-shot detectors with tracking (ByteTrack) and skip-frame processing to maintain quality while reducing how often heavy detection runs, enabling real-time performance for some models.
It supports multiple anonymization approaches beyond bounding boxes, including instance segmentation masks (pixel-precise blurring/pixelation) and customizable blur shapes (e.g., lasso, polygon, star).
The tool includes multiple user interfaces (Flask web UI, a browser-based demo, and a studio/editor-style workflow) and adds additional capabilities like 360° equirectangular video support.

local natural language based video blurring/anonymization tool runs on 4K at 76 fps

It's not just a text-prompt wrapper though. I benchmarked 168 combinations (7 detectors × 3 trackers × 4 skip rates × 2 resolutions) on 4K footage:

Model	Effective FPS on 4K	What it does
RF-DETR Nano Det + skip=4	76 fps	Auto-detect faces/people, real-time on 4K
RF-DETR Med Seg + skip=2	9 fps	Pixel-precise instance segmentation masks
Grounding DINO	~2 fps	Text-prompted — describe what to blur
Florence-2	~2 fps	Visual grounding with natural language
SAM2	varies	Click or draw box to select what to blur

The text-prompted models (GDINO, Florence-2) are slower (~2 fps) but the flexibility is worth it — you don't need to retrain anything, just describe what you want gone.

How it works locally:

Grounding DINO takes your text prompt → runs zero-shot detection on each frame → ByteTrack tracks detections across frames → blur/pixelate applied with custom shapes
Skip-frame tracking: run detection every Nth frame, tracker interpolates the rest. Skip=4 → 4× speedup with no visible quality loss
All weights download automatically on first run, everything stays local
Browser UI (Flask) — upload video, type your prompt, process, download

Other stuff:

8 total detection models (RF-DETR, YOLO, Grounding DINO, Florence-2, SAM2, MediaPipe, Cascade)
360° equirectangular video support (Insta360 X5 / GoPro Max up to 8K)
Custom blur shapes — lasso, polygon, star, circle drawn on detected bounding boxes
Instance segmentation for pixel-precise masks, not just bounding boxes
3 interfaces: full studio editor, simple upload-and-process, real-time MJPEG streaming demo

python -m privacy_blur.web_app --port 5001

Runs entirely local. Repo has GIFs comparing all the model approaches side by side on the same 4K frame.

Github link

Curious what text prompts people would want to use for anonymization; the Grounding DINO integration can detect basically anything you can describe.

Yet user preferences are different so what would be most usecases and would it help if hosted a website like Photopea is there a demand for this?

submitted by /u/Honest-Debate-6863
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Unitree's IPO

ChinaTalk

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

Dev.to

I Built a Local-First AI Knowledge Base for Developers — Here's What Makes It Different

Dev.to

local natural language based video blurring/anonymization tool runs on 4K at 76 fps

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Unitree's IPO

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

I Built a Local-First AI Knowledge Base for Developers — Here's What Makes It Different

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer