GAFSV-Net: A Vision Framework for Online Signature Verification

arXiv cs.CV / 5/4/2026

💬 OpinionModels & Research

Key Points

  • The paper introduces GAFSV-Net, a new framework for online signature verification that tackles skilled forgeries and small enrollment sets under high intra-class variability.
  • Instead of modeling signatures as raw 1D temporal sequences, it converts each signature into a six-channel asymmetric Gramian Angular Field image using kinematic signals (pen speed, pressure derivative, and direction angle) encoded via complementary GASF and GADF matrices.
  • The model uses a dual-branch ConvNeXt-Tiny encoder with bidirectional cross-attention so each branch can leverage discriminative patterns from the other before projecting into a metric space.
  • Training combines a semi-hard triplet loss with skilled-forgery hard-negative injection, and inference is performed by cosine similarity to a small enrollment prototype.
  • Experiments on DeepSignDB and BiosecurID show improved performance over sequence-based deep learning baselines with matched objectives, supported by ablation studies that quantify the impact of each design choice.

Abstract

Online signature verification (OSV) requires distinguishing skilled forgeries from genuine samples under high intra-class variability and with very few enrollment samples. Existing deep learning methods operate directly on raw temporal sequences, restricting them to 1D architectures and preventing the use of pretrained 2D vision backbones. We bridge this gap with GAFSV-Net, which represents each signature as a six-channel asymmetric Gramian Angular Field image: three kinematic channels (pen speed, pressure derivative, direction angle) are each encoded into complementary GASF and GADF matrices that capture pairwise temporal co-occurrence and directional transition structure respectively. A dual-branch ConvNeXt-Tiny encoder processes GASF and GADF independently, with bidirectional cross-attention enabling each branch to query discriminative patterns from the other before metric-space projection. Training uses semi-hard triplet loss with skilled-forgery hard-negative injection; verification is performed via cosine similarity against a small enrollment prototype. We evaluate on DeepSignDB and BiosecurID, outperforming all sequence-based baselines trained under identical objectives, demonstrating that the representational gain of 2D temporal encoding is consistent and independent of training procedure, with ablations characterising each design choice's contribution.