Stream: Announcements

Topic: Introduce Minilang and our machine learning toolkit


view this post on Zulip Qiyuan Xu (Dec 24 2025 at 15:37):

Merry Christmas, my dear colleagues in Isabelle.

Neural Theorem Proving (NTP) is an automated reasoning technique that applies LLMs or other neural models to generate proofs, offering a promising approach to free ITP users from manual proof construction. While Lean may appear to dominate NTP development in recent years, Isabelle actually led the field as recently as two years ago. When I asked researchers why they have shifted to Lean, they cited Isabelle's lack of adequate infrastructure for machine learning and data mining.

We believe this gap has now been addressed through our development of a comprehensive machine learning toolkit for Isabelle, which includes a REPL, precompiled AFP heap images, and a minimalist proof language specifically designed for machine learning. This language, _Minilang_, aims to reduce the conceptual complexity that LLMs must learn, thereby improving training efficiency. Experiments demonstrate that models trained with the most basic approach (supervised finetuning) can prove 79% of PISA problems (a set of randomly selected proof goals from AFP) within 8 attempts. A translator from Isar to Minilang is also provided, successfully translating 85% of proofs in AFP and the stdlib of Isabelle/HOL. This work has been accepted in OOPSLA'26, and a pre-print is available on arXiv.

While Lean currently maintains its lead in AI for mathematical competitions, the landscape differs for practical proof engineering tasks (e.g., those from Isabelle's AFP and Lean's mathlib). On LeanDojo, a counterpart of PISA in Lean, the best result achieves 60% (by Alchemy), while Minilang's most basic LLM (a basic SFT with naive prompt context setting) reaches 79% on PISA. Although these results are from different benchmarks and not directly comparable, Minilang still positions Isabelle once again at the forefront of AI-driven theorem proving for practical proof engineering.

We envision Minilang as a foundational component of the AI toolkit ecosystem for Isabelle. To this end, we have open-sourced all code and data (see the end of our paper) and are committed to providing full technical support to Minilang and our toolkit users.

Our team will continue developing AI provers and tools for Isabelle, with the ultimate goal of providing production-ready, push-button solutions. We will update on our progress.

Long live Isabelle, and may it continue to prosper.
Qiyuan

view this post on Zulip Qiyuan Xu (Jan 12 2026 at 16:05):

@Mario Xerxes Castelán Castro opinion is welcome :)

view this post on Zulip Yosuke Ito (Jan 14 2026 at 01:21):

@Qiyuan Xu
I'm not an expert on AI, but your research is interesting to me.

view this post on Zulip Mario Xerxes Castelán Castro (Jan 15 2026 at 01:23):

Qiyuan Xu ha dicho:

Mario Xerxes Castelán Castro opinion is welcome :)

view this post on Zulip Qiyuan Xu (Jan 23 2026 at 18:30):

@Mario Xerxes Castelán Castro , will respond to you later, but let me update my progress first.

The recent AI4math trends to use non-finetuned agents, and Numina Agent hit the SOTA on the PutnamBench.
This could be a great opportunity for Isabelle. Previously, a major pain point for me was the lack of math competition data for Isabelle—the equivalent datasets for Lean were created by hiring experts at a cost of millions of dollars (based on what I've gathered from some AI companies). Without such data, it was nearly impossible to train LLMs for math competition problems in Isabelle. But now, agents built on general-purpose LLMs (Claude, Gemini) without any fine-tuning have surprisingly achieved SOTA—and Numina Agent is open-source. This means I can directly adapt and improve its implementation for Minilang. Since Minilang has better infrastructure than Lean (because I, as an Isabelle hacker, am confident in my work), the agent should perform even better. This might just correct some misconceptions held by certain overconfident AI folks in the AI4Math community. I'm currently visiting MBZUAI and leveraging their nearly unlimited CPU clusters and GPUs for data extraction and training. I hope to share more good news soon.


Last updated: Feb 16 2026 at 08:52 UTC