Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

arXiv cs.AI / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Reinforcement Patching (ReinPatch), a framework that learns variable-sized, data-adaptive patch boundaries for long-horizon sequence data (especially continuous time series) in an end-to-end way.
ReinPatch formulates patch placement as a discrete decision process optimized with reinforcement learning using Group Relative Policy Gradient (GRPG), avoiding soft discretization, continuous relaxations, and heuristic boundary rules.
The method can strictly enforce a target compression rate while letting the downstream backbone scale efficiently, and it supports multi-level hierarchical modeling.
Experiments on time-series forecasting datasets show ReinPatch outperforming state-of-the-art data-driven patching strategies.
Because the patching module can be detached as a standalone foundation patcher, the authors claim it provides interpretable visual and empirical insights into how a performance-driven model segments sequences.

Abstract

Efficiently aggregating spatial or temporal horizons to acquire compact representations has become a unifying principle in modern deep learning models, yet learning data-adaptive representations for long-horizon sequence data, especially continuous sequences like time series, remains an open challenge. While fixed-size patching has improved scalability and performance, discovering variable-sized, data-driven patches end-to-end often forces models to rely on soft discretization, specific backbones, or heuristic rules. In this work, we propose Reinforcement Patching (ReinPatch), the first framework to jointly optimize a sequence patching policy and its downstream sequence backbone model using reinforcement learning. By formulating patch boundary placement as a discrete decision process optimized via Group Relative Policy Gradient (GRPG), ReinPatch bypasses the need for continuous relaxations and performs dynamic patching policy optimization in a natural manner. Moreover, our method allows strict enforcement of a desired compression rate, freeing the downstream backbone to scale efficiently, and naturally supports multi-level hierarchical modeling. We evaluate ReinPatch on time-series forecasting datasets, where it demonstrates compelling performance compared to state-of-the-art data-driven patching strategies. Furthermore, our detached design allows the patching module to be extracted as a standalone foundation patcher, providing the community with visual and empirical insights into the segmentation behaviors preferred by a purely performance-driven neural patching strategy.

Black Hat Asia

AI Business

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Dev.to

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Reddit r/artificial

Stop Tweaking Prompts: Build a Feedback Loop Instead

Dev.to

Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints

Dev.to

Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

Key Points

Abstract

Related Articles

Black Hat Asia

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Stop Tweaking Prompts: Build a Feedback Loop Instead

Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer