美团开源560B参数定理证明模型,72次推理通过率97.1%刷新开源SOTA

Gate News 消息,3月21日,美团LongCat团队开源LongCat-Flash-Prover,这是一个5600亿参数的MoE模型,专注于形式化定理证明语言Lean4的数学推理任务。模型权重以MIT协议发布,已上线GitHub、Hugging Face和ModelScope。

该模型将形式化推理拆解为三项独立能力:自动形式化(将自然语言数学问题转化为Lean4形式语句)、草图生成(产出引理风格的证明框架)和完整证明生成。三项能力均通过Agent工具集成推理(TIR)与Lean4编译器实时交互验证。

训练方面,团队提出Hybrid-Experts Iteration Framework生成冷启动数据,并在强化学习阶段引入HisPO算法稳定MoE模型的长程任务训练,同时加入定理一致性和合法性检测机制防止reward hacking。

基准测试显示,LongCat-Flash-Prover在开源权重模型中刷新了自动形式化和定理证明两项SOTA。MiniF2F-Test上仅用72次推理即达97.1%通过率,ProverBench和PutnamBench分别达到70.8%和41.5%,每题推理次数不超过220次。

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Commento
0/400
Nessun commento