FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper addresses a key limitation of current frying-oil oxidation monitoring, noting that wet-chemistry assays are destructive, lack spatial information, and cannot support real-time use.
  • It identifies a “camera-fingerprint shortcut” in thermal-image inspection models, where networks overfit to sensor noise/thermal bias rather than learning oxidation chemistry, causing poor performance when evaluated on different video sets.
  • The proposed FryNet uses a dual-stream RGB–thermal architecture to segment the oil region, classify serviceability, and regress four chemical oxidation indices (PV, p-AV, Totox, and temperature) in one forward pass.
  • FryNet’s design combines a ThermalMiT-B2 backbone with attention, an RGB-MAE encoder trained with masked autoencoding and chemical alignment, and a dual-encoder DANN adversarial regularization (via Gradient Reversal Layers) plus FiLM fusion to connect thermal structure with RGB chemical context.
  • On 7,226 paired frames from 28 frying videos, FryNet reports strong results—98.97% mIoU for segmentation, 100% classification accuracy, and 2.32 mean regression MAE—outperforming seven baselines and demonstrating resilience to the video-disjoint evaluation issue.

Abstract

Monitoring frying oil degradation is critical for food safety, yet current practice relies on destructive wet-chemistry assays that provide no spatial information and are unsuitable for real-time use. We identify a fundamental obstacle in thermal-image-based inspection, the camera-fingerprint shortcut, whereby models memorize sensor-specific noise and thermal bias instead of learning oxidation chemistry, collapsing under video-disjoint evaluation. We propose FryNet, a dual-stream RGB-thermal framework that jointly performs oil-region segmentation, serviceability classification, and regression of four chemical oxidation indices (PV, p-AV, Totox, temperature) in a single forward pass. A ThermalMiT-B2 backbone with channel and spatial attention extracts thermal features, while an RGB-MAE Encoder learns chemically grounded representations via masked autoencoding and chemical alignment. Dual-Encoder DANN adversarially regularizes both streams against video identity via Gradient Reversal Layers, and FiLM fusion bridges thermal structure with RGB chemical context. On 7,226 paired frames across 28 frying videos, FryNet achieves 98.97% mIoU, 100% classification accuracy, and 2.32 mean regression MAE, outperforming all seven baselines.