Dear colleagues in Isabelle,
I'm excited to share that our paper Neural Theorem Proving for Verification Conditions: A Real-World Benchmark has been accepted at ICLR'26, one of the top 3 Machine Learning conferences.
Neural Theorem Proving (NTP), which leverages LLMs and other AI technologies for automated theorem proving, has achieved remarkable results in challenging competitions such as the International Mathematical Olympiad (AlphaProver, Aristotle Prover, Seed-Prover).
As a member of the ITP community, I have long hoped that NTP could assist us in our daily work --- freeing us from the heavy lifting of proof engineering, and perhaps even addressing the criticism from major industry companies regarding the automation of ITP-based foundational program verification.
Unfortunately, research driven by the AI community has predominantly focused on mathematical competitions, which diverges from the practical needs of our ITP community. To draw the AI community's attention to our practical needs, we propose a benchmark specifically for verification conditions.
Preliminary experiments on this benchmark show that Isabelle's infrastructure (Sledgehammer) and its LLM-based approach (Minilang) outperform all other models in our study, including those based on Lean and Rocq.
We hope this work can serve as a stepping stone toward establishing Isabelle's leading position in the field of AI-assisted automated theorem proving for program verification.
Cheers, and here's to an even brighter future for Isabelle!
Qiyuan
Last updated: Feb 04 2026 at 02:22 UTC