IRIS: Interleaved Reinforcement with Incremental Staged Curriculum for Cross-Lingual Mathematical Reasoning

arXiv cs.CL / 4/28/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces IRIS, a two-axis curriculum framework for improving cross-lingual mathematical reasoning by combining progressively harder supervised fine-tuning with reverse-curriculum reinforcement learning to lessen dependence on step-by-step prompts.
IRIS uses a composite reward that covers answer correctness, step-level alignment, continuity of reasoning, and numeric incentives, optimized with Group Relative Policy Optimization (GRPO).
The authors release CL-Math, a dataset of 29k multilingual math problems with step-level annotations in English, Hindi, and Marathi, intended to support multilingual reasoning research.
Experiments across benchmarks and curated multilingual test sets show consistent performance gains, particularly for low-resource and bilingual settings, with smaller improvements for high-resource languages.

Abstract

Curriculum learning helps language models tackle complex reasoning by gradually increasing task difficulty. However, it often fails to generate consistent step-by-step reasoning, especially in multilingual and low-resource settings where cross-lingual transfer from English to Indian languages remains limited. We propose IRIS: Interleaved Reinforcement with Incremental Staged Curriculum, a two-axis framework that combines Supervised Fine-Tuning on progressively harder problems (vertical axis) with Reverse Curriculum Reinforcement Learning to reduce reliance on step-by-step guidance (horizontal axis). We design a composite reward combining correctness, step-wise alignment, continuity, and numeric incentives, optimized via Group Relative Policy Optimization (GRPO). We release CL-Math, a dataset of 29k problems with step-level annotations in English, Hindi, and Marathi. Across standard benchmarks and curated multilingual test sets, IRIS consistently improves performance, with strong results on math reasoning tasks and substantial gains in low-resource and bilingual settings, alongside modest improvements in high-resource languages.

LLMs will be a commodity

Reddit r/artificial

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dev.to

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

IRIS: Interleaved Reinforcement with Incremental Staged Curriculum for Cross-Lingual Mathematical Reasoning

Key Points

Abstract

Related Articles

LLMs will be a commodity

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dex lands $5.3M to grow its AI-driven talent matching platform

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer