最初の一般公開オープンソース血液検出モデルを公開:データセット、重み、CLI

Reddit r/MachineLearning / 2026/4/25

📰 ニュースDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

要点

  • BloodshotNetは、出血(血液)を検出するための初の一般公開オープンソースモデルで、信頼と安全およびコンテンツモデレーション用途を目的に公開された。
  • 公開物として、23k+枚の注釈付き画像データセット、YOLO26のsmall/nanoモデル重み(AGPL-3.0)、および画像・フォルダ・動画を1コマンドで解析できるCLIが提供される。
  • smallモデルの性能は概ね精度(precision)約0.8、再現率(recall)約0.6、CPUでも40FPS超で、動画ではフレーム単位の完全さよりシーン単位のシグナルとしての検出が有効だと述べている。
  • open-vocabularyのテキストプロンプト型(例:YOLO-E)は精度・再現率ともに苦戦し、血液の不規則なパターンがテキスト説明ではモデルに情報を与えにくいことが要因だと推測している。
  • 次の予定として、データセット(特に映画・シネマ作品)拡充、YOLO26m(medium)版の学習、OpenVINO INT8のエッジ高速化向けエクスポートが挙げられている。

Hey all, today we're releasing BloodshotNet, the world's first open-source blood detection model. We built it primarily for Trust & Safety and content moderation use cases, the idea of acting as a front-line filter so users and human reviewers aren't exposed to graphic imagery.

What we're open sourcing today:

  • 🤗 Dataset: 23k+ annotated images (forensic scenes, UFC footage, horror/gore movies, surgical content) with a large hard-negative slice to keep false positives in check. It quietly crossed 7k downloads before we even officially announced
  • 🤗 Model weights: YOLO26 small and nano variants (AGPL-3.0)
  • 🐙 CLI: analyze an image, folder, or video in one command, 2 lines of setup via uv

Performance on the small model:

  • ~0.8 precision
  • ~0.6 recall,
  • 40+ FPS even on CPU

A few things we found interesting while building this:

The recall number looks modest, but in practice works well for video. Blood in high-contrast action/gore scenes gets caught reliably. For borderline cases, a sliding window over 5–10 second clips is the right approach; you don't need per-frame perfection, but rather a scene-level signal.

We tried open-vocabulary/text-prompt models like YOLO-E, and they genuinely struggled. Both recall and precision were bad. Our guess is a combination of filtered training data and the fact that blood has irregular enough patterns that a text description doesn't give the model much to work with. YOLO26 with ProgLoss + STAL was noticeably better, specifically for small objects like tiny droplets, and the training/augmentation tooling is just really solid.

We did consider transformer architectures as they'd theoretically handle the fluid dynamics and frame-to-frame context much better. The blocker is data: annotated video datasets for this basically don't exist and are hard to produce. YOLO26 also wins on latency and training stability, so it was the right call for now.

What's next:

  • Expanding the dataset, specifically, more annotated cinematic content
  • Training a YOLO26m (medium) variant
  • OpenVINO INT8 exports for faster edge inference

If you want the full technical breakdown, we wrote it up here: article

Would love to know what you end up using it for. Contributions are welcome!

submitted by /u/PeterHash
[link] [comments]