LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
arXiv cs.AI / 2026/3/24
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper introduces LongCat-Flash-Prover, a 560B-parameter open-weights MoE model targeting native formal reasoning in Lean4 using agentic tool-integrated reasoning (TIR).
- It breaks formal reasoning into three capabilities—auto-formalization, sketching, and proving—and uses a Hybrid-Experts Iteration Framework to expand high-quality training trajectories from informal problems to formal statements and full proofs.
- For agentic reinforcement learning on these long-horizon tasks, the authors propose HisPO (Hierarchical Importance Sampling Policy Optimization) to stabilize MoE training via gradient masking that addresses policy staleness and train–inference discrepancies at sequence and token levels.
- To reduce reward hacking, the system includes theorem consistency checks and legality detection mechanisms.
- Experiments report new state-of-the-art results for open-weights models on auto-formalization and theorem proving, including a 97.1% pass rate on MiniF2F-Test with only 72 inference budget per problem and strong performance on ProverBench and PutnamBench.




