MEMO: Human-like Crisp Edge Detection Using Masked Edge Prediction

arXiv cs.CV / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MEMO (Masked Edge Prediction Model), showing that human-like crisp, single-pixel edge outputs can be achieved using only cross-entropy loss without specialized loss functions or architecture changes.
  • MEMO is pre-trained on a large synthetic edge dataset to improve generalization, then fine-tuned on downstream tasks with a lightweight module adding only about 1.2% extra parameters.
  • During training, the model learns to predict edges under different input masking ratios, improving robustness and enabling crispness at inference.
  • The key inference idea is that thick edges correlate with a confidence gradient, and MEMO uses a progressive, confidence-ordered prediction strategy to sequentially finalize pixels and produce thinner, more precise contours.
  • Experiments report improved crispness-aware performance and produce post-processing-free edge maps compared with prior approaches.

Abstract

Learning-based edge detection models trained with cross-entropy loss often suffer from thick edge predictions, which deviate from the crisp, single-pixel annotations typically provided by humans. While previous approaches to achieving crisp edges have focused on designing specialized loss functions or modifying network architectures, we show that a carefully designed training and inference strategy alone is sufficient to achieve human-like edge quality. In this work, we introduce the Masked Edge Prediction MOdel (MEMO), which produces both accurate and crisp edges using only cross-entropy loss. We first construct a large-scale synthetic edge dataset to pre-train MEMO, enhancing its generalization ability. Subsequent fine-tuning on downstream datasets requires only a lightweight module comprising 1.2\% additional parameters. During training, MEMO learns to predict edges under varying ratios of input masking. A key insight guiding our inference is that thick edge predictions typically exhibit a confidence gradient: high in the center and lower toward the boundaries. Leveraging this, we propose a novel progressive prediction strategy that sequentially finalizes edge predictions in order of prediction confidence, resulting in thinner and more precise contours. Our method achieves visually appealing, post-processing-free, human-like edge maps and outperforms prior methods on crispness-aware evaluations.