EPIR: An Efficient Patch Tokenization, Integration and Representation Framework for Micro-expression Recognition

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces EPIR, an efficient framework for micro-expression recognition that targets the high computational cost of Transformer-based models caused by large token counts in self-attention.
EPIR uses a dual norm shifted tokenization (DNSPT) module to better learn spatial relationships in the face region via refined spatial transformation and dual norm projection.
It reduces token overhead through a token integration module that merges partial tokens across cascaded Transformer blocks while aiming to avoid information loss.
A discriminative token extractor is proposed, improving Transformer attention behavior and using a dynamic token selection module (DTSM) to focus on key, more informative tokens.
Experiments on four public micro-expression datasets (CASME II, SAMM, SMIC, CAS(ME)3) show state-of-the-art gains, including 9.6% UF1 improvement on CAS(ME)3 and 4.58% UAR improvement on SMIC.

Abstract

Micro-expression recognition can obtain the real emotion of the individual at the current moment. Although deep learning-based methods, especially Transformer-based methods, have achieved impressive results, these methods have high computational complexity due to the large number of tokens in the multi-head self-attention. In addition, the existing micro-expression datasets are small-scale, which makes it difficult for Transformer-based models to learn effective micro-expression representations. Therefore, we propose a novel Efficient Patch tokenization, Integration and Representation framework (EPIR), which can balance high recognition performance and low computational complexity. Specifically, we first propose a dual norm shifted tokenization (DNSPT) module to learn the spatial relationship between neighboring pixels in the face region, which is implemented by a refined spatial transformation and dual norm projection. Then, we propose a token integration module to integrate partial tokens among multiple cascaded Transformer blocks, thereby reducing the number of tokens without information loss. Furthermore, we design a discriminative token extractor, which first improves the attention in the Transformer block to reduce the unnecessary focus of the attention calculation on self-tokens, and uses the dynamic token selection module (DTSM) to select key tokens, thereby capturing more discriminative micro-expression representations. We conduct extensive experiments on four popular public datasets (i.e., CASME II, SAMM, SMIC, and CAS(ME)3. The experimental results show that our method achieves significant performance gains over the state-of-the-art methods, such as 9.6% improvement on the CAS(ME)

^3

dataset in terms of UF1 and 4.58% improvement on the SMIC dataset in terms of UAR metric.

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

Dev.to

can we talk about how AI has gotten really good at lying to you?

Reddit r/artificial

EPIR: An Efficient Patch Tokenization, Integration and Representation Framework for Micro-expression Recognition

Key Points

Abstract

Related Articles

Black Hat Asia

GLM 5.1 tops the code arena rankings for open models

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

can we talk about how AI has gotten really good at lying to you?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer