From: "\"Bisping, Benjamin\"" <cl-isabelle-users@lists.cam.ac.uk>
Hello everybody,
This week, I plan to convince a computer science audience that interactive theorem proving is the future of mathematics and verification. (I'm sure many on this mailing list know this situation.) I intend to build some arguments around Terence Tao getting ten-thousands of people to watch him use Lean+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will employ Isabelle/HOL for the demonstration, I strongly anticipate questions in the direction of how to hook it up with GitHub Copilot.
I tried it myself and Isabelle+Copilot seems highly non-trivial due to the incompatibilities introduced by Microsoft on one side and the Isabelle/Codium package on the other side.* This is a pity because, in principle, this year's LLM coding assistants appear to finally be quite decent at finding Isabelle proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can solve tricky stuff, for instance, find fitting induction invariants.
But of course, the workflow to co-edit theories between VSCode (or ChatGPT promts) and JEdit is too clunky to be proposed in my “sales pitch” for ITPs. And if there's no easy automation for writing proofs in Isabelle, but there is in Lean, I already know how 100 % of the listeners will decide which of the two systems to look into.
I wonder whether people here have positive experiences / suggestions on LLM-supported proof generation in Isabelle I could report to advertise it?**
Kind regards,
Ben
From: Makarius <makarius@sketis.net>
On 08/12/2025 21:04, "Bisping, Benjamin" (via cl-isabelle-users Mailing List)
wrote:
But of course, the workflow to co-edit theories between VSCode (or ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for ITPs.
And if there's no easy automation for writing proofs in Isabelle, but there is
in Lean, I already know how 100 % of the listeners will decide which of the
two systems to look into.
Oh dear, better use Lean then.
Side-remark: The tool of choice for proof automation is called "Sledgehammer".
Makarius
From: Dominic Mulligan <cl-isabelle-users@lists.cam.ac.uk>
Dear Benjamin,
You may find the I/Q plugin interesting. It exposes Isabelle as an MCP server to LLMs and allows the agent to interactively edit files, query proof states, and so on.
The code is available here: https://github.com/awslabs/autocorrode
It’s mainly been tested with Amazon’s Kiro CLI agent (formerly called Q CLI), but I believe at least one other person has used it in “anger” with another agent.
Thanks,
Dominic
On 8 Dec 2025, at 20:04, Bisping, Benjamin (via cl-isabelle-users Mailing List) <cl-isabelle-users@lists.cam.ac.uk> wrote:
Hello everybody,
This week, I plan to convince a computer science audience that interactive theorem proving is the future of mathematics and verification. (I'm sure many on this mailing list know this situation.) I intend to build some arguments around Terence Tao getting ten-thousands of people to watch him use Lean+Copilot this year <https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will employ Isabelle/HOL for the demonstration, I strongly anticipate questions in the direction of how to hook it up with GitHub Copilot.
I tried it myself and Isabelle+Copilot seems highly non-trivial due to the incompatibilities introduced by Microsoft on one side and the Isabelle/Codium package on the other side.* This is a pity because, in principle, this year's LLM coding assistants appear to finally be quite decent at finding Isabelle proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can solve tricky stuff, for instance, find fitting induction invariants.
But of course, the workflow to co-edit theories between VSCode (or ChatGPT promts) and JEdit is too clunky to be proposed in my “sales pitch” for ITPs. And if there's no easy automation for writing proofs in Isabelle, but there is in Lean, I already know how 100 % of the listeners will decide which of the two systems to look into.
I wonder whether people here have positive experiences / suggestions on LLM-supported proof generation in Isabelle I could report to advertise it?**
Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html . The 2025 message on the topic to this mailing list went unanswered https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html .
** Prior discussion about the discontinued Isabelle plugin for VS Code, which otherwise would at least be some answer. https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
From: Wolfram Kahl <kahl@cas.mcmaster.ca>
On Mon, Dec 08, 2025 at 09:07:26PM +0100, Makarius wrote:
Side-remark: The tool of choice for proof automation is called
"Sledgehammer".
Does that SOUND elegant and/or sophisticated to you?
Perhaps a marketing problem... ;-)
Wolfram
From: Makarius <makarius@sketis.net>
On 08/12/2025 21:16, Wolfram Kahl wrote:
On Mon, Dec 08, 2025 at 09:07:26PM +0100, Makarius wrote:
Side-remark: The tool of choice for proof automation is called
"Sledgehammer".Does that SOUND elegant and/or sophisticated to you?
Yes. It has become a well-established name in the ITP community. There are
also many other "Hammer" projects, but not that successful.
Perhaps a marketing problem... ;-)
Who cares about marketing? I don't.
Makarius
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
There is a DARPA program called PROVERS where I know some people are trying
to come with such a feature for Isabelle. You may want to reach out to
performers of DARPA PROVERS. I believe that the Stanford University team
are cooking something for Isabelle.
Although, computer scientists have to take a minute and remember that
undecidable problems such as finding proofs in HOL or finding invariants or
even writing a complete specification cannot be automated. This is what we
learned in 3rd semester of CS class, specifically under theory of
computation, more specifically under computability theory.
The best solution figured are ITPs themselves, where we have humans in the
loop to create computer assisted abstractions combined with symbolic AI
tools (i.e., automated reasoning via for example natural deduction)to
increase automations.
LLMs are the biggest imposters I have ever seen. They cannot be trusted to
even handle deterministic and repetitive processes. Let alone trusting them
to do reasoning.
My 2 cents.
Yakoub.
On Mon, Dec 8, 2025 at 1:07 PM Makarius <makarius@sketis.net> wrote:
On 08/12/2025 21:04, "Bisping, Benjamin" (via cl-isabelle-users Mailing
List)
wrote:But of course, the workflow to co-edit theories between VSCode (or
ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of
the
two systems to look into.Oh dear, better use Lean then.
Side-remark: The tool of choice for proof automation is called
"Sledgehammer".Makarius
From: Lawrence Paulson <lp15@cam.ac.uk>
It is quite hard to beat the simple combination of the structured Isar language and sledgehammer. First, just try sledgehammer. If it fails, think of an intermediate formula and try using sledgehammer to close both gaps. If not, continue with more intermediate formulas. Many straightforward proofs can be found that way.
Larry
On 8 Dec 2025 at 20:21 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>, wrote:
The best solution figured are ITPs themselves, where we have humans in the loop to create computer assisted abstractions combined with symbolic AI tools (i.e., automated reasoning via for example natural deduction)to increase automations.
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
… and because the word AI is making the trend these days. Sledgehammer is
your symbolic AI tool. And humans with their brain cells brings
their neurons. And so the interaction between a human and an ITP such as
Isabelle gives you the most powerful neuro symbolic AI tool.
On Mon, Dec 8, 2025 at 1:42 PM Lawrence Paulson <lp15@cam.ac.uk> wrote:
It is quite hard to beat the simple combination of the structured Isar
language and sledgehammer. First, just try sledgehammer. If it fails, think
of an intermediate formula and try using sledgehammer to close both gaps.
If not, continue with more intermediate formulas. Many straightforward
proofs can be found that way.Larry
On 8 Dec 2025 at 20:21 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:The best solution figured are ITPs themselves, where we have humans in the
loop to create computer assisted abstractions combined with symbolic AI
tools (i.e., automated reasoning via for example natural deduction)to
increase automations.
From: Kevin Kappelmann <cl-isabelle-users@lists.cam.ac.uk>
Hi Ben,
You can use Isabelle/VSCode (i.e. use isabelle vscode instead of isabelle jedit) and then install any extension you like from the open-vsx marketplace [1], e.g. continue [2], which can be connected to GPT-5.1 (I believe).
Best wishes,
Kevin
[1] https://open-vsx.org/
[2] https://open-vsx.org/extension/Continue/continue
On 8 December 2025 21:04:32 CET, "Bisping, Benjamin" <cl-isabelle-users@lists.cam.ac.uk> wrote:
Hello everybody,
This week, I plan to convince a computer science audience that interactive theorem proving is the future of mathematics and verification. (I'm sure many on this mailing list know this situation.) I intend to build some arguments around Terence Tao getting ten-thousands of people to watch him use Lean+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will employ Isabelle/HOL for the demonstration, I strongly anticipate questions in the direction of how to hook it up with GitHub Copilot.
I tried it myself and Isabelle+Copilot seems highly non-trivial due to the incompatibilities introduced by Microsoft on one side and the Isabelle/Codium package on the other side.* This is a pity because, in principle, this year's LLM coding assistants appear to finally be quite decent at finding Isabelle proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can solve tricky stuff, for instance, find fitting induction invariants.
But of course, the workflow to co-edit theories between VSCode (or ChatGPT promts) and JEdit is too clunky to be proposed in my “sales pitch” for ITPs. And if there's no easy automation for writing proofs in Isabelle, but there is in Lean, I already know how 100 % of the listeners will decide which of the two systems to look into.
I wonder whether people here have positive experiences / suggestions on LLM-supported proof generation in Isabelle I could report to advertise it?**
Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html . The 2025 message on the topic to this mailing list went unanswered https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html .
** Prior discussion about the discontinued Isabelle plugin for VS Code, which otherwise would at least be some answer. https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
From: Changyang Zeng <cl-isabelle-users@lists.cam.ac.uk>
Glad to see that some of us are building agents with Isabelle, I am just
wondering what kind of software or hardware people tend to use Isabelle to
prove?
Thanks,
Changyang
On Mon, Dec 8, 2025 at 2:05 PM Mete Polat <metepolat2000@gmail.com> wrote:
Currently building an agentic IDE at omergence.dev with Isabelle in the
backend and GitHub Copilot built in. Hit me up if you have any questions or
are interested.On Mon, Dec 8, 2025, 12:04 "Bisping, Benjamin" <
cl-isabelle-users@lists.cam.ac.uk> wrote:Hello everybody,
This week, I plan to convince a computer science audience that
interactive theorem proving is the future of mathematics and verification.
(I'm sure many on this mailing list know this situation.) I intend to build
some arguments around Terence Tao getting ten-thousands of people to
watch him use Lean+Copilot this year
<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will employ
Isabelle/HOL for the demonstration, I strongly anticipate questions in the
direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to
the incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium package on the other side.* This is a pity because, in
principle, this year's LLM coding assistants appear to finally be quite
decent at finding Isabelle proofs. If I open Isabelle/HOL theories in VS
Code, GPT-5.1-Codex can solve tricky stuff, for instance, find fitting
induction invariants.But of course, the workflow to co-edit theories between VSCode (or
ChatGPT promts) and JEdit is too clunky to be proposed in my “sales pitch”
for ITPs. And if there's no easy automation for writing proofs in Isabelle,
but there is in Lean, I already know how 100 % of the listeners will decide
which of the two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html .
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
From: Mete Polat <metepolat2000@gmail.com>
Currently building an agentic IDE at omergence.dev with Isabelle in the
backend and GitHub Copilot built in. Hit me up if you have any questions or
are interested.
On Mon, Dec 8, 2025, 12:04 "Bisping, Benjamin" <
cl-isabelle-users@lists.cam.ac.uk> wrote:
Hello everybody,
This week, I plan to convince a computer science audience that interactive
theorem proving is the future of mathematics and verification. (I'm sure
many on this mailing list know this situation.) I intend to build some
arguments around Terence Tao getting ten-thousands of people to watch him
use Lean+Copilot this year <https://www.youtube.com/watch?v=cyyR7j2ChCI>.
As I will employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium package on the other side.* This is a pity because, in
principle, this year's LLM coding assistants appear to finally be quite
decent at finding Isabelle proofs. If I open Isabelle/HOL theories in VS
Code, GPT-5.1-Codex can solve tricky stuff, for instance, find fitting
induction invariants.But of course, the workflow to co-edit theories between VSCode (or ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs. And if there's no easy automation for writing proofs in Isabelle, but
there is in Lean, I already know how 100 % of the listeners will decide
which of the two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html .
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
From: Jonathan Julián Huerta y Munive <jonjulian23@gmail.com>
Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the future
to adhere to Isabelle standards. However, for the demonstration purposes
you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems in
Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the manual
parts at the end of the video.
Best wishes,
Jonathan
Message-ID: <3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that interactive
theorem proving is the future of mathematics and verification. (I'm sure
many
on this mailing list know this situation.) I intend to build some arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will
employ Isabelle/HOL for the demonstration, I strongly anticipate questions
in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of the
two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which
otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
It would be great if someone holding a position at academia does a study
specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.
Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:
Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the future
to adhere to Isabelle standards. However, for the demonstration purposes
you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems in
Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the manual
parts at the end of the video.Best wishes,
Jonathan
Message-ID: <3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for
Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that interactive
theorem proving is the future of mathematics and verification. (I'm sure
many
on this mailing list know this situation.) I intend to build some
arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I
will
employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding
Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can
solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of
the
two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which
otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
From: Lawrence Paulson <lp15@cam.ac.uk>
Sledgehammer run easily on a laptop and its energy consumption is essentially zero compared with the amount of energy consumed to train GPT agents.
Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>, wrote:
It would be great if someone holding a position at academia does a study specifically on ITPs. Comparing the amount of energy needed for Sledgehammer to generate a proof. And the amount of energy consumed to train GPT agents and prompting them to generate the same proof. And study the impact of energy consumption from prompting and training GPT agents on societies, communities and environment.
Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <jonjulian23@gmail.com<mailto:jonjulian23@gmail.com>> wrote:
Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace Transformers.
It is very much in a prototypical state and will be upgraded in the future to adhere to Isabelle standards. However, for the demonstration purposes you mention, it should suffice. For instance, here is a video<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems in Isabelle. The result is still interactive theorem proving, but the DeepIsaHOL project has support<https://ieeexplore.ieee.org/document/11024329> for automating the manual parts at the end of the video.
Best wishes,
Jonathan
Message-ID: <3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de<mailto:3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de<mailto:benjamin.bisping@tu-berlin.de>>
Subject: [isabelle] Copilot (or other programming assistants) for Isabelle?
Hello everybody,
This week, I plan to convince a computer science audience that interactive
theorem proving is the future of mathematics and verification. (I'm sure many
on this mailing list know this situation.) I intend to build some arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will
employ Isabelle/HOL for the demonstration, I strongly anticipate questions in
the direction of how to hook it up with GitHub Copilot.
I tried it myself and Isabelle+Copilot seems highly non-trivial due to the
incompatibilities introduced by Microsoft on one side and the Isabelle/Codium
package on the other side.* This is a pity because, in principle, this year's
LLM coding assistants appear to finally be quite decent at finding Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can solve
tricky stuff, for instance, find fitting induction invariants.
But of course, the workflow to co-edit theories between VSCode (or ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for ITPs.
And if there's no easy automation for writing proofs in Isabelle, but there is
in Lean, I already know how 100 % of the listeners will decide which of the
two systems to look into.
I wonder whether people here have positive experiences / suggestions on LLM-
supported proof generation in Isabelle I could report to advertise it?**
Kind regards,
Ben
--
Jonathan Julian Huerta y Munive
Researcher on formal methods and automated reasoning
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
That was exactly my point, Larry.
Someone needs to put the light on these GPT agents with data as evidence.
Sledgehammer used to give me an Isabelle proof with less energy and less
time for quite complicated statements. Most of the time when the statement
is complicated the GPT agents will consume more energy and more time
without even generating a proof. And because GPT agents are good liars they
still generate something and call it “ a profile of a proof”.
Not only that, when I prompt them to translate some spec written in natural
language to Isabelle/HOL, even if they generate something that works it is
always looking clunky, not generic, can be better her and there. Namely, it
adds an unnecessary layer of distraction and time wasting for users.
Maybe those GPT agents are good for certain tasks, but definitely not for
programming let alone reasoning.
On Wed, Dec 10, 2025 at 1:47 AM Lawrence Paulson <lp15@cam.ac.uk> wrote:
Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:It would be great if someone holding a position at academia does a study
specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems
in Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the
manual parts at the end of the video.Best wishes,
Jonathan
Message-ID: <3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for
Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that
interactive
theorem proving is the future of mathematics and verification. (I'm sure
many
on this mailing list know this situation.) I intend to build some
arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I
will
employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to
the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding
Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can
solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or
ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of
the
two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which
otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
From: Lawrence Paulson <lp15@cam.ac.uk>
Useful to know. I find them good for finding things in our libraries (both lemmas and obscure syntax). They also generate pretend proofs.
Larry
On 10 Dec 2025, at 14:07, Yakoub Nemouchi <y.nemouchi@gmail.com> wrote:
Not only that, when I prompt them to translate some spec written in natural language to Isabelle/HOL, even if they generate something that works it is always looking clunky, not generic, can be better her and there. Namely, it adds an unnecessary layer of distraction and time wasting for users.
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
I agree. Using them “carefully” as an advanced search engine can be
helpful. Especially now we have a huge AFP repository and not all the users
are aware of it nor they setup it when they download Isabelle main
distribution.
My answers are based on the assumption coming from this thread which is
people are using them to find proofs in HOL!
Best wishes,
Yakoub.
On Wed, Dec 10, 2025 at 10:08 AM Lawrence Paulson <lp15@cam.ac.uk> wrote:
Useful to know. I find them good for finding things in our libraries (both
lemmas and obscure syntax). They also generate pretend proofs.Larry
On 10 Dec 2025, at 14:07, Yakoub Nemouchi <y.nemouchi@gmail.com> wrote:
Not only that, when I prompt them to translate some spec written in
natural language to Isabelle/HOL, even if they generate something that
works it is always looking clunky, not generic, can be better her and
there. Namely, it adds an unnecessary layer of distraction and time wasting
for users.
From: Josef Urban <josef.urban@gmail.com>
I said recently in a talk that the ATP and hammer algorithms were
"generative AI" long before LLMs appeared [1]. For someone using a hammer
(or an efficient tactical prover like TacticToe/Tactician) routinely to get
correct proofs, "Chatgpt as a prover" felt like an expensive joke when it
first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).
I would hope the "magic of scaling up" hype is mostly over and even the
LLM/DL community has recently (e.g. with the Deepseek paper) acknowledged
that efficient search/reasoning and combinations of algorithms matter a lot
(cf "agentic AI" as the next emerging buzzword).
Projects like Jonathan's should allow people in academia to eventually
rigorously evaluate which (combinations of) algorithms and architectures
work better than others. There has been plenty of decent and
interesting/innovative work done continuously by the AITP community in this
direction (e.g. [2,3] pointing to nice non-Transformer architectures).
Hopefully, some of the "super excited" mathematicians will eventually
realize that AI/TP is also a science ;-).
Josef
[1] https://github.com/JUrban/LMvsATP
[2] https://arxiv.org/abs/2401.02949
[3] https://arxiv.org/abs/2503.07792
On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson <lp15@cam.ac.uk> wrote:
Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:It would be great if someone holding a position at academia does a study
specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems
in Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the
manual parts at the end of the video.Best wishes,
Jonathan
Message-ID: <3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for
Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that
interactive
theorem proving is the future of mathematics and verification. (I'm sure
many
on this mailing list know this situation.) I intend to build some
arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I
will
employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to
the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding
Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can
solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or
ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of
the
two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which
otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
From: "\"Lammich, Peter (UT-EEMCS)\"" <cl-isabelle-users@lists.cam.ac.uk>
I've seen ChatGPT hallucinating lemma and function names that do not exist. And coming up with pretty random looking 'proofs', that obviously don't work. Is there a better model for that than GPT?
Peter
On 10 Dec 2025 18:08, Lawrence Paulson <lp15@cam.ac.uk> wrote:
Useful to know. I find them good for finding things in our libraries (both lemmas and obscure syntax). They also generate pretend proofs.
Larry
On 10 Dec 2025, at 14:07, Yakoub Nemouchi <y.nemouchi@gmail.com> wrote:
Not only that, when I prompt them to translate some spec written in natural language to Isabelle/HOL, even if they generate something that works it is always looking clunky, not generic, can be better her and there. Namely, it adds an unnecessary layer of distraction and time wasting for users.
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
I think this is the right approach. Before introducing such a tool to
student, it is always good to remind the about foundations of computation,
and have the following reminder/disclaimer at the beginning:
“In CS there is a set of problems called undecidable problems that cannot
be automated by any kind of algorithm, whether it is AI based or not. For
example, there is no algorithm that allows to systematically provide a yes
or no answer to all statements written in HOL and automatically know if a
proof exists, i.e., demonstrating automatically if the statement is true or
false for all HOL formulas. The best solution figured so far are ITPs
themselves ( including their backend consisting automated reasoning tools).
Now we have a new expansive toy in town, which is less efficient and way
more taxing for society, environment, and communities. If you guys really
want to use it anyways, here is how. And by the way this is not the only
form of AI that exits, symbolic AI (i.e., automated reasoning tools)
existed for at least the last 50 years.”
On Wed, Dec 10, 2025 at 1:10 PM Josef Urban <josef.urban@gmail.com> wrote:
I said recently in a talk that the ATP and hammer algorithms were
"generative AI" long before LLMs appeared [1]. For someone using a hammer
(or an efficient tactical prover like TacticToe/Tactician) routinely to get
correct proofs, "Chatgpt as a prover" felt like an expensive joke when it
first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).I would hope the "magic of scaling up" hype is mostly over and even the
LLM/DL community has recently (e.g. with the Deepseek paper) acknowledged
that efficient search/reasoning and combinations of algorithms matter a lot
(cf "agentic AI" as the next emerging buzzword).Projects like Jonathan's should allow people in academia to eventually
rigorously evaluate which (combinations of) algorithms and architectures
work better than others. There has been plenty of decent and
interesting/innovative work done continuously by the AITP community in this
direction (e.g. [2,3] pointing to nice non-Transformer architectures).
Hopefully, some of the "super excited" mathematicians will eventually
realize that AI/TP is also a science ;-).Josef
[1] https://github.com/JUrban/LMvsATP
[2] https://arxiv.org/abs/2401.02949
[3] https://arxiv.org/abs/2503.07792On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson <lp15@cam.ac.uk> wrote:
Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:It would be great if someone holding a position at academia does a study
specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems
in Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the
manual parts at the end of the video.Best wishes,
Jonathan
Message-ID: <
3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for
Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that
interactive
theorem proving is the future of mathematics and verification. (I'm
sure many
on this mailing list know this situation.) I intend to build some
arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I
will
employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to
the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding
Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can
solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or
ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of
the
two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unansweredhttps://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which
otherwise would at least be some answer.https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
From: Makarius <makarius@sketis.net>
On 08/12/2025 21:07, Makarius wrote:
better use Lean then.
More side-remarks: The Lean community has enough money and manpower to follow
the latest and greatest trends quickly, even if they turn out dead ends. Thus
it is always interesting to attend the virtual Lean Together event, every year
in January. Non-Lean users are explicitly welcome.
The website for Jan-2026 is here: https://leanprover-community.github.io/lt2026
The past event from Jan-2025 is documented here:
https://leanprover-community.github.io/lt2025/schedule.html
This particular talk from 15-Jan-2025 is relevant to the current mail thread:
"""
Jason Rute (IBM Research)
Title: The last mile: How do we make AI theorem provers which work in the real
world for real users and not just on benchmarks?
Abstract: As AI research progresses, we are seeing massive gains on formal
math benchmarks like MiniF2F and challenges like the IMO. However, these gains
are not directly accessible to the typical ITP user, nor are they used to
build formal mathematical libraries. Is this because the technology is not as
good as promised or because there is a disconnect between research and the
day-to-day needs of users? I propose that even today, we have sufficient AI
technology to enhance the experience of proof assistant users significantly. I
approach this from the lens of product-focused software engineering. How do we
bridge “the last mile” to make a useful product for the end user?
"""
E.g. see this introductory slide "Why don't we have AI assistants for real ITP
users?" https://youtu.be/Yr8dzfVkeHg?t=141
I have attended the talk virtually on that day. To me, the abstract sounded
more optimistic than the impression given in the talk. I have mentally
ticked-off the topic with the note: "Check again in 3-5 years, if they have
something that works properly, or have discovered that the very approach is a
bad idea".
It will be still interesting to see what will be presented in a few weeks on
virtual Lean Together 2026.
Makarius
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
Peter, if you want something serious I suggest you build your GPT agent
based on Mistral models. I trust tech coming from France because French
education system have a culture of looping in reasoning techniques for
computation in CS class, rather than teaching students engineering
techniques that rely only on brute force computation.
I always keep an eye on tech products coming from researchers or companies
based in France. They always end up finding a balance between powerful
computation and resource efficiency for their products because they loop in
reasoning in their R&D. Maybe my perception is wrong but this is my
personal experience with other tech products.
PS: I am Isabelle user not Coq user. But for example Isabelle/Naproche is
based on CNL model from Franch research institution. I am sure Sledgehammer
leverage a lot from french researchers.
Best wishes,
Yakoub.
On Wed, Dec 10, 2025 at 1:51 PM Lammich, Peter (UT-EEMCS) <
p.lammich@utwente.nl> wrote:
I've seen ChatGPT hallucinating lemma and function names that do not
exist. And coming up with pretty random looking 'proofs', that obviously
don't work. Is there a better model for that than GPT?Peter
On 10 Dec 2025 18:08, Lawrence Paulson <lp15@cam.ac.uk> wrote:
Useful to know. I find them good for finding things in our libraries (both
lemmas and obscure syntax). They also generate pretend proofs.Larry
On 10 Dec 2025, at 14:07, Yakoub Nemouchi <y.nemouchi@gmail.com> wrote:
Not only that, when I prompt them to translate some spec written in
natural language to Isabelle/HOL, even if they generate something that
works it is always looking clunky, not generic, can be better her and
there. Namely, it adds an unnecessary layer of distraction and time wasting
for users.
From: waldinge <cl-isabelle-users@lists.cam.ac.uk>
Plugging my recent paper on extracting programs from automatically discovered proofs. (No LLMs used, but the matter is discussed).
https://arxiv.org/abs/2508.11136
On Dec 10, 2025, at 12:51 PM, Yakoub Nemouchi <y.nemouchi@gmail.com> wrote:
I think this is the right approach. Before introducing such a tool to student, it is always good to remind the about foundations of computation, and have the following reminder/disclaimer at the beginning:
“In CS there is a set of problems called undecidable problems that cannot be automated by any kind of algorithm, whether it is AI based or not. For example, there is no algorithm that allows to systematically provide a yes or no answer to all statements written in HOL and automatically know if a proof exists, i.e., demonstrating automatically if the statement is true or false for all HOL formulas. The best solution figured so far are ITPs themselves ( including their backend consisting automated reasoning tools). Now we have a new expansive toy in town, which is less efficient and way more taxing for society, environment, and communities. If you guys really want to use it anyways, here is how. And by the way this is not the only form of AI that exits, symbolic AI (i.e., automated reasoning tools) existed for at least the last 50 years.”
On Wed, Dec 10, 2025 at 1:10 PM Josef Urban <josef.urban@gmail.com <mailto:josef.urban@gmail.com>> wrote:
I said recently in a talk that the ATP and hammer algorithms were "generative AI" long before LLMs appeared [1]. For someone using a hammer (or an efficient tactical prover like TacticToe/Tactician) routinely to get correct proofs, "Chatgpt as a prover" felt like an expensive joke when it first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).
I would hope the "magic of scaling up" hype is mostly over and even the LLM/DL community has recently (e.g. with the Deepseek paper) acknowledged that efficient search/reasoning and combinations of algorithms matter a lot (cf "agentic AI" as the next emerging buzzword).
Projects like Jonathan's should allow people in academia to eventually rigorously evaluate which (combinations of) algorithms and architectures work better than others. There has been plenty of decent and interesting/innovative work done continuously by the AITP community in this direction (e.g. [2,3] pointing to nice non-Transformer architectures). Hopefully, some of the "super excited" mathematicians will eventually realize that AI/TP is also a science ;-).
Josef
[1] https://github.com/JUrban/LMvsATP
[2] https://arxiv.org/abs/2401.02949
[3] https://arxiv.org/abs/2503.07792On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson <lp15@cam.ac.uk <mailto:lp15@cam.ac.uk>> wrote:
Sledgehammer run easily on a laptop and its energy consumption is essentially zero compared with the amount of energy consumed to train GPT agents.
Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com <mailto:y.nemouchi@gmail.com>>, wrote:It would be great if someone holding a position at academia does a study specifically on ITPs. Comparing the amount of energy needed for Sledgehammer to generate a proof. And the amount of energy consumed to train GPT agents and prompting them to generate the same proof. And study the impact of energy consumption from prompting and training GPT agents on societies, communities and environment.
Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <jonjulian23@gmail.com <mailto:jonjulian23@gmail.com>> wrote:Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project <https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace Transformers.
It is very much in a prototypical state and will be upgraded in the future to adhere to Isabelle standards. However, for the demonstration purposes you mention, it should suffice. For instance, here is a video <https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems in Isabelle. The result is still interactive theorem proving, but the DeepIsaHOL project has support <https://ieeexplore.ieee.org/document/11024329> for automating the manual parts at the end of the video.Best wishes,
Jonathan
Message-ID: <3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de <mailto:3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de <mailto:benjamin.bisping@tu-berlin.de>>
Subject: [isabelle] Copilot (or other programming assistants) for Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that interactive
theorem proving is the future of mathematics and verification. (I'm sure many
on this mailing list know this situation.) I intend to build some arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I will
employ Isabelle/HOL for the demonstration, I strongly anticipate questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to the
incompatibilities introduced by Microsoft on one side and the Isabelle/Codium
package on the other side.* This is a pity because, in principle, this year's
LLM coding assistants appear to finally be quite decent at finding Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for ITPs.
And if there's no easy automation for writing proofs in Isabelle, but there is
in Lean, I already know how 100 % of the listeners will decide which of the
two systems to look into.I wonder whether people here have positive experiences / suggestions on LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html .
The 2025 message on the topic to this mailing list went unanswered
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html .
** Prior discussion about the discontinued Isabelle plugin for VS Code, which
otherwise would at least be some answer.
https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html>
--
Jonathan Julian Huerta y Munive
Researcher on formal methods and automated reasoning
From: Yakoub Nemouchi <y.nemouchi@gmail.com>
Josef -
Just for clarification. I am not questioning the value of Jonathan’s work
or others work.
I am just trying to create some awareness for CS community in the middle of
this LLMs noise.
Best,
Yakoub.
On Wed, Dec 10, 2025 at 1:10 PM Josef Urban <josef.urban@gmail.com> wrote:
I said recently in a talk that the ATP and hammer algorithms were
"generative AI" long before LLMs appeared [1]. For someone using a hammer
(or an efficient tactical prover like TacticToe/Tactician) routinely to get
correct proofs, "Chatgpt as a prover" felt like an expensive joke when it
first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).I would hope the "magic of scaling up" hype is mostly over and even the
LLM/DL community has recently (e.g. with the Deepseek paper) acknowledged
that efficient search/reasoning and combinations of algorithms matter a lot
(cf "agentic AI" as the next emerging buzzword).Projects like Jonathan's should allow people in academia to eventually
rigorously evaluate which (combinations of) algorithms and architectures
work better than others. There has been plenty of decent and
interesting/innovative work done continuously by the AITP community in this
direction (e.g. [2,3] pointing to nice non-Transformer architectures).
Hopefully, some of the "super excited" mathematicians will eventually
realize that AI/TP is also a science ;-).Josef
[1] https://github.com/JUrban/LMvsATP
[2] https://arxiv.org/abs/2401.02949
[3] https://arxiv.org/abs/2503.07792On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson <lp15@cam.ac.uk> wrote:
Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:It would be great if someone holding a position at academia does a study
specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems
in Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the
manual parts at the end of the video.Best wishes,
Jonathan
Message-ID: <
3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for
Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that
interactive
theorem proving is the future of mathematics and verification. (I'm
sure many
on this mailing list know this situation.) I intend to build some
arguments
around Terence Tao getting ten-thousands of people to watch him use Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I
will
employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to
the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding
Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can
solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or
ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which of
the
two systems to look into.I wonder whether people here have positive experiences / suggestions on
LLM-
supported proof generation in Isabelle I could report to advertise it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unansweredhttps://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS Code,
which
otherwise would at least be some answer.https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
From: Harry Butterworth <heb1001@gmail.com>
I think the main thing missing from historical AI approaches in general is
the ability to create higher level abstractions from observations.
IMO, creating higher level abstractions is a significant part of what maths
is about: observe a pattern, make a formal statement of the pattern, search
for a proof to determine if the pattern always holds.
If you consider that mathematical expressions and statements are enumerable
then the interesting proofs are those that appear on the shortest paths
linking nodes of the tree of the enumeration where proofs linking nodes
closer to the root of the tree are more interesting and useful (because
they are more generic) than proofs linking nodes at more remote levels
(where they are more specific).
Taking Euler’s identity as an example: it’s a statement that sits near the
root of the tree (because it’s short) that links expressions that also sit
near the root of the tree (because they are short too) and it’s an
abstraction — a shortcut — for its own proof which, in contrast, is a more
convoluted path that climbs high up into the branches of the tree before
coming back down again.
The “noise” about LLMs has arisen because they are demonstrating the
ability to create higher level abstractions from observations (the training
data) during self-supervised learning without requiring explicit human
input. This is abductive inference and it’s something that historical
approaches can’t do (AFAIK).
LLMs are not necessarily the best architecture for abductive inference and
the scaffolding around current LLMs is not necessarily particularly
appropriate for doing mathematics but LLMs do represent a step change in
the right direction in terms of capability.
Undecidability and Godel’s theorems do not restrict computers from becoming
superhuman at mathematics (even if you believe in some kind of substrate
dependence, which I don’t, it is possible to create biological computers).
Another way of looking at Godel’s theorems is that every undecidable
proposition creates two new branches within mathematics and that this
branching tree is proved to be infinite. You wouldn’t want it any other
way as there is an isomorphism between reality and a mathematical
description of reality and who would choose to live in a finite universe?
In fact, if you subscribe to the simulation hypothesis (and consequentially
the dual aspect theory of information), the entire universe may simply be
the internal perspective of a theorem prover currently working on a
statement which is isomorphic to me winning this argument about the
significance of LLMs. A statement somewhere almost inconceivably high up
in the branches of highly specific and asymptotically useless mathematics.
Enjoy!
On Thu, 11 Dec 2025 at 12:22, Yakoub Nemouchi <y.nemouchi@gmail.com> wrote:
Josef -
Just for clarification. I am not questioning the value of Jonathan’s work
or others work.I am just trying to create some awareness for CS community in the middle
of this LLMs noise.Best,
Yakoub.On Wed, Dec 10, 2025 at 1:10 PM Josef Urban <josef.urban@gmail.com> wrote:
I said recently in a talk that the ATP and hammer algorithms were
"generative AI" long before LLMs appeared [1]. For someone using a hammer
(or an efficient tactical prover like TacticToe/Tactician) routinely to get
correct proofs, "Chatgpt as a prover" felt like an expensive joke when it
first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).I would hope the "magic of scaling up" hype is mostly over and even the
LLM/DL community has recently (e.g. with the Deepseek paper) acknowledged
that efficient search/reasoning and combinations of algorithms matter a lot
(cf "agentic AI" as the next emerging buzzword).Projects like Jonathan's should allow people in academia to eventually
rigorously evaluate which (combinations of) algorithms and architectures
work better than others. There has been plenty of decent and
interesting/innovative work done continuously by the AITP community in this
direction (e.g. [2,3] pointing to nice non-Transformer architectures).
Hopefully, some of the "super excited" mathematicians will eventually
realize that AI/TP is also a science ;-).Josef
[1] https://github.com/JUrban/LMvsATP
[2] https://arxiv.org/abs/2401.02949
[3] https://arxiv.org/abs/2503.07792On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson <lp15@cam.ac.uk> wrote:
Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:It would be great if someone holding a position at academia does a study
specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.Best wishes,
Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:Hi Benjamin,
You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.
It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a video
<https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple theorems
in Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support
<https://ieeexplore.ieee.org/document/11024329> for automating the
manual parts at the end of the video.Best wishes,
Jonathan
Message-ID: <
3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-berlin.de>
Date: Mon, 8 Dec 2025 20:04:32 +0000
From: "Bisping, Benjamin" <benjamin.bisping@tu-berlin.de>
Subject: [isabelle] Copilot (or other programming assistants) for
Isabelle?Hello everybody,
This week, I plan to convince a computer science audience that
interactive
theorem proving is the future of mathematics and verification. (I'm
sure many
on this mailing list know this situation.) I intend to build some
arguments
around Terence Tao getting ten-thousands of people to watch him use
Lean
+Copilot this year<https://www.youtube.com/watch?v=cyyR7j2ChCI>. As I
will
employ Isabelle/HOL for the demonstration, I strongly anticipate
questions in
the direction of how to hook it up with GitHub Copilot.I tried it myself and Isabelle+Copilot seems highly non-trivial due to
the
incompatibilities introduced by Microsoft on one side and the
Isabelle/Codium
package on the other side.* This is a pity because, in principle, this
year's
LLM coding assistants appear to finally be quite decent at finding
Isabelle
proofs. If I open Isabelle/HOL theories in VS Code, GPT-5.1-Codex can
solve
tricky stuff, for instance, find fitting induction invariants.But of course, the workflow to co-edit theories between VSCode (or
ChatGPT
promts) and JEdit is too clunky to be proposed in my “sales pitch” for
ITPs.
And if there's no easy automation for writing proofs in Isabelle, but
there is
in Lean, I already know how 100 % of the listeners will decide which
of the
two systems to look into.I wonder whether people here have positive experiences / suggestions
on LLM-
supported proof generation in Isabelle I could report to advertise
it?**Kind regards,
Ben
- Prior 2023 hacky solution for Copilot in Isabelle's VS Codium
https://isabelle.systems/zulip-archive/stream/202961-General/topic/Isabelle.2FVSCode.20Github.20Copilot.html
.
The 2025 message on the topic to this mailing list went unansweredhttps://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-05/msg00002.html
.
** Prior discussion about the discontinued Isabelle plugin for VS
Code, which
otherwise would at least be some answer.https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2023-10/msg00005.html
--
*Jonathan Julian Huerta y Munive *
Researcher on formal methods and automated reasoning
Last updated: Dec 21 2025 at 20:24 UTC