DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training

arXiv cs.LG / 3/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

We propose Distribution-Aware Piecewise Activation (DAPA), a differentiable, hardware-friendly activation for Transformer models on edge devices that leverages the distribution of pre-activation data.
DAPA uses a non-uniform piecewise approximation with finer segments in high-probability regions to improve generalization over prior piecewise-linear methods.
It is quantized using Distribution-Weighted Mean Square Error to reduce latency and resource usage for hardware deployment.
An HLS implementation shows that DAPA speeds up GELU computation by 16x and cuts DSP utilization by 16x, while maintaining or improving performance on vision Transformers and GPT-2.

Abstract

Non-linear activation functions play a pivotal role in on-device inference and training, as they not only consume substantial hardware resources but also impose a significant impact on system performance and energy efficiency. In this work, we propose Distribution-Aware Piecewise Activation (DAPA), a differentiable and hardware-friendly activation function for Transformer architectures by exploiting the distribution of pre-activation data. DAPA employs a non-uniform piecewise approximation that allocates finer segments to high-probability regions of the distribution, improving generalizability over prior piecewise linear methods. The resulting approximation is further quantized using Distribution-Weighted Mean Square Error to reduce latency and resource utilization for hardware deployment. Our HLS implementation demonstrates that DAPA speeds up GELU computation by 16

\times

and decreases DSP utilization by 16

\times

while maintaining comparable or better performance across vision Transformers and GPT-2 models.

How I Gave My AI a Real Brain: The System That Runs Half My Company

Dev.to

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

The Complete Data Stack for AI Agents — 10 MCP Tools

Dev.to

DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training

Key Points

Abstract

Related Articles

How I Gave My AI a Real Brain: The System That Runs Half My Company

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

The Complete Data Stack for AI Agents — 10 MCP Tools

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer