LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
arXiv cs.AI / 3/24/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces LongCat-Flash-Prover, a 560B-parameter open-weights MoE model targeting native formal reasoning in Lean4 using agentic tool-integrated reasoning (TIR).
- It breaks formal reasoning into three capabilities—auto-formalization, sketching, and proving—and uses a Hybrid-Experts Iteration Framework to expand high-quality training trajectories from informal problems to formal statements and full proofs.
- For agentic reinforcement learning on these long-horizon tasks, the authors propose HisPO (Hierarchical Importance Sampling Policy Optimization) to stabilize MoE training via gradient masking that addresses policy staleness and train–inference discrepancies at sequence and token levels.
- To reduce reward hacking, the system includes theorem consistency checks and legality detection mechanisms.
- Experiments report new state-of-the-art results for open-weights models on auto-formalization and theorem proving, including a 97.1% pass rate on MiniF2F-Test with only 72 inference budget per problem and strong performance on ProverBench and PutnamBench.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
InVideo AI Review: Fast Finished
Dev.to