ProUIE: A Macro-to-Micro Progressive Learning Method for LLM-based Universal Information Extraction

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ProUIE, a macro-to-micro progressive learning method for LLM-based universal information extraction that aims to improve results without adding any external information.
  • ProUIE uses three stages: complete modeling (CM) to learn NER/RE/EE in intrinsic difficulty order, streamlined alignment (SA) to regularize and simplify structured outputs, and deep exploration (DE) using GRPO with stepwise fine-grained rewards.
  • Experiments across 36 public datasets show ProUIE consistently boosts unified extraction performance and outperforms strong instruction-tuned baselines for NER and RE.
  • The method achieves these gains using a smaller backbone and reports clear improvements for large-scale, production-oriented information extraction settings.

Abstract

LLM-based universal information extraction (UIE) methods often rely on additional information beyond the original training data, which increases training complexity yet often yields limited gains. To address this, we propose ProUIE, a Macro-to-Micro progressive learning approach that improves UIE without introducing any external information. ProUIE consists of three stages: (i) macro-level Complete Modeling (CM), which learns NER, RE, and EE along their intrinsic difficulty order on the full training data to build a unified extraction foundation, (ii) meso-level Streamlined Alignment (SA), which operates on sampled data with simplified target formats, streamlining and regularizing structured outputs to make them more concise and controllable, and (iii) micro-level Deep Exploration (DE), which applies GRPO with stepwise fine-grained rewards (SFR) over structural units to guide exploration and improve performance. Experiments on 36 public datasets show that ProUIE consistently improves unified extraction, outperforming strong instruction-tuned baselines on average for NER and RE while using a smaller backbone, and it further demonstrates clear gains in large-scale production-oriented information extraction.