AI Navigate

Combining Microscopy Data and Metadata for Reconstruction of Cellular Traction Forces Using a Hybrid Vision Transformer-U-Net

arXiv cs.CV / 3/17/2026

📰 NewsModels & Research

Key Points

  • A new hybrid deep learning architecture called ViT+UNet combines U‑Net with a Vision Transformer to reconstruct cellular traction force fields from microscopy data and metadata.
  • The model outperforms both standalone U‑Net and standalone Vision Transformer in predicting traction force fields across multiple spatial scales and noise levels.
  • The approach enables the inclusion of contextual metadata, such as cell-type information, to enhance prediction specificity and accuracy.
  • It demonstrates robust generalization across different experimental setups and imaging systems, suggesting broad applicability to diverse TFM datasets.

Abstract

Traction force microscopy (TFM) is a widely used technique for quantifying the forces that cells exert on their surrounding extracellular matrix. Although deep learning methods have recently been applied to TFM data analysis, several challenges remain-particularly achieving reliable inference across multiple spatial scales and integrating additional contextual information such as cell type to improve accuracy. In this study, we propose ViT+UNet, a robust deep learning architecture that integrates a U-Net with a Vision Transformer. Our results demonstrate that this hybrid model outperforms both standalone U-Net and Vision Transformer architectures in predicting traction force fields. Furthermore, ViT+UNet exhibits superior generalization across diverse spatial scales and varying noise levels, enabling its application to TFM datasets obtained from different experimental setups and imaging systems. By appropriately structuring the input data, our approach also allows the inclusion of metadata, in our case cell-type information, to enhance prediction specificity and accuracy.