[isabelle] Claude/Copilot (or other programming assistant... · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Claude/Copilot (or other programming assistant...

Email Gateway (Mar 04 2026 at 21:44):

From: Yakoub Nemouchi <y.nemouchi@gmail.com>

I am experiencing similar effects (more to come soon).

I still hold on my opinion on the cost of training these things. If we
keep saying that there is a problem at many levels with the cost of
training LLMs, I am sure a student in one of CD classes will work on it and
can find a way to train the same model with way less cost.

Thats the message we should keep communicating to CS student.

Also it is still not solving undecidable problems! As far as we stay a way
from algebraic structures based on HOL classes and locales, it is doing
well for well known theorems!

I start to see more transparency, specifically, from Meta and Google, about
the cost of training their models. Which I like a lot.

I hope Anthropic follows the same culture and start publishing numbers
about the cost of training their models. And I admit, for now Claude Opus
4.6 is impressing me in many ways.

Best wishes,
Yakoub.

On Wed, Mar 4, 2026 at 5:59 AM Sergey Tverdyshev <tvsergey@googlemail.com>
wrote:

I think it worth posting this one here

Claude’s Cycles

Don Knuth, Stanford Computer Science Department (28 February 2026; revised
02 March 2026)

Shock! Shock! I learned yesterday that an open problem I’d been working on
for several weeks had just been solved by Claude Opus 4.6 — Anthropic’s
hybrid reasoning model that had been released three weeks

earlier! It seems that I’ll have to revise my opinions about “generative
AI” one of these days. What a joy

it is to learn not only that my conjecture has a nice solution but also to
celebrate this dramatic

[image: preview.png]

claude-cycles <https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf>
PDF Document · 122 KB
<https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf>
<https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf>

--
Dr. Sergey Tverdyshev

PS: sent from phone with apologies for brevity, typos, and
auto-corrections.

On 1. Mar 2026, at 20:46, Harry Butterworth <heb1001@gmail.com> wrote:

I pursued this a little with Gemini and managed to fix some of the more
obvious BS and get it to make some “testable predictions” :joy:

https://share.google/aimode/vEVuuSa9eCX4Qo8C8

tl;dr: Pity the poor physicists for they work the face of an unyielding
“category error”. Worship the logicians for you are the gods of your
fractal domain.

On Wed, 25 Feb 2026 at 12:01, Harry Butterworth <heb1001@gmail.com> wrote:

The last couple of paragraphs in my previous email were a bit
happy-clappy so I spent a little time trying to reconcile them with quantum
mechanics and general relativity.

I read Tegmark’s “Our mathematical universe” which contains a significant
amount of what follows, and I’m aware of Wolfram’s Ruliad (which obviously
contains everything, somewhere).

I wrote some draft emails with the resulting ideas but eventually just
fed them to Gemini to try to get it to roll them into something coherent.

Gemini goes off the rails a bit in places and, unsurprisingly, there’s a
fair amount of questionable BS. I pushed back a little at the end.
Overall, I think it did a reasonable job of turning my input into something
less opaque.

At least, it serves as an example of the need for a significant amount of
work after abductive inference to determine whether you have anything of
value.

https://share.google/aimode/lVcN4fqShhzve2PCz

Harry

On Thu, 11 Dec 2025 at 15:18, Harry Butterworth <heb1001@gmail.com>
wrote:

I think the main thing missing from historical AI approaches in general
is the ability to create higher level abstractions from observations.

IMO, creating higher level abstractions is a significant part of what
maths is about: observe a pattern, make a formal statement of the pattern,
search for a proof to determine if the pattern always holds.

If you consider that mathematical expressions and statements are
enumerable then the interesting proofs are those that appear on the
shortest paths linking nodes of the tree of the enumeration where proofs
linking nodes closer to the root of the tree are more interesting and
useful (because they are more generic) than proofs linking nodes at more
remote levels (where they are more specific).

Taking Euler’s identity as an example: it’s a statement that sits near
the root of the tree (because it’s short) that links expressions that also
sit near the root of the tree (because they are short too) and it’s an
abstraction — a shortcut — for its own proof which, in contrast, is a more
convoluted path that climbs high up into the branches of the tree before
coming back down again.

The “noise” about LLMs has arisen because they are demonstrating the
ability to create higher level abstractions from observations (the training
data) during self-supervised learning without requiring explicit human
input. This is abductive inference and it’s something that historical
approaches can’t do (AFAIK).

LLMs are not necessarily the best architecture for abductive inference
and the scaffolding around current LLMs is not necessarily particularly
appropriate for doing mathematics but LLMs do represent a step change in
the right direction in terms of capability.

Undecidability and Godel’s theorems do not restrict computers from
becoming superhuman at mathematics (even if you believe in some kind of
substrate dependence, which I don’t, it is possible to create biological
computers).

Another way of looking at Godel’s theorems is that every undecidable
proposition creates two new branches within mathematics and that this
branching tree is proved to be infinite. You wouldn’t want it any other
way as there is an isomorphism between reality and a mathematical
description of reality and who would choose to live in a finite universe?

In fact, if you subscribe to the simulation hypothesis (and
consequentially the dual aspect theory of information), the entire universe
may simply be the internal perspective of a theorem prover currently
working on a statement which is isomorphic to me winning this argument
about the significance of LLMs. A statement somewhere almost inconceivably
high up in the branches of highly specific and asymptotically useless
mathematics.

Enjoy!

On Thu, 11 Dec 2025 at 12:22, Yakoub Nemouchi <y.nemouchi@gmail.com>
wrote:

Josef -

Just for clarification. I am not questioning the value of Jonathan’s
work or others work.

I am just trying to create some awareness for CS community in the
middle of this LLMs noise.

Best,
Yakoub.

On Wed, Dec 10, 2025 at 1:10 PM Josef Urban <josef.urban@gmail.com>
wrote:

I said recently in a talk that the ATP and hammer algorithms were
"generative AI" long before LLMs appeared [1]. For someone using a hammer
(or an efficient tactical prover like TacticToe/Tactician) routinely to get
correct proofs, "Chatgpt as a prover" felt like an expensive joke when it
first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).

I would hope the "magic of scaling up" hype is mostly over and even
the LLM/DL community has recently (e.g. with the Deepseek paper)
acknowledged that efficient search/reasoning and combinations of algorithms
matter a lot (cf "agentic AI" as the next emerging buzzword).

Projects like Jonathan's should allow people in academia to eventually
rigorously evaluate which (combinations of) algorithms and architectures
work better than others. There has been plenty of decent and
interesting/innovative work done continuously by the AITP community in this
direction (e.g. [2,3] pointing to nice non-Transformer architectures).
Hopefully, some of the "super excited" mathematicians will eventually
realize that AI/TP is also a science ;-).

Josef

[1] https://github.com/JUrban/LMvsATP
[2] https://arxiv.org/abs/2401.02949
[3] https://arxiv.org/abs/2503.07792

On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson <lp15@cam.ac.uk> wrote:

Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.

Larry
On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi <y.nemouchi@gmail.com>,
wrote:

It would be great if someone holding a position at academia does a
study specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to
train GPT agents and prompting them to generate the same proof. And study
the impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.

Best wishes,

Yakoub.
On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive <
jonjulian23@gmail.com> wrote:

Hi Benjamin,

You might want to try the resources in the DeepIsaHOL project
<https://github.com/yonoteam/DeepIsaHOL>.
Currently it has support for ChatGPT, Gemini, Ollama,
and HuggingFace Transformers.
It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a
video <https://youtu.be/Uq7I5sGaHo0> showing ChatGPT proving simple
theorems in Isab
[message truncated]
preview.png

Email Gateway (Mar 05 2026 at 15:27):

From: Sergey Tverdyshev <cl-isabelle-users@lists.cam.ac.uk>

I think it worth posting this one here

Claude’s Cycles

Don Knuth, Stanford Computer Science Department (28 February 2026; revised 02
March 2026)

Shock! Shock! I learned yesterday that an open problem I’d been working on for
several weeks had just been solved by Claude Opus 4.6 — Anthropic’s hybrid
reasoning model that had been released three weeks

earlier! It seems that I’ll have to revise my opinions about “generative AI”
one of these days. What a joy

it is to learn not only that my conjecture has a nice solution but also to
celebrate this dramatic

[![preview.png](cid:0DFF3E65-017B-4D22-8518-A96283A35322)

| claude-cycles PDF Document · 122 KB
---](https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf)

\--

Dr. Sergey Tverdyshev

PS: sent from phone with apologies for brevity, typos, and auto-corrections.

On 1. Mar 2026, at 20:46, Harry Butterworth <heb1001@gmail.com> wrote:

I pursued this a little with Gemini and managed to fix some of the more
obvious BS and get it to make some “testable predictions” :joy:

<https://share.google/aimode/vEVuuSa9eCX4Qo8C8>

tl;dr: Pity the poor physicists for they work the face of an unyielding
“category error”. Worship the logicians for you are the gods of your
fractal domain.

On Wed, 25 Feb 2026 at 12:01, Harry Butterworth
<heb1001@gmail.com> wrote:

The last couple of paragraphs in my previous email were a bit happy-clappy
so I spent a little time trying to reconcile them with quantum mechanics and
general relativity.

>

>

>

I read Tegmark’s “Our mathematical universe” which contains a significant
amount of what follows, and I’m aware of Wolfram’s Ruliad (which obviously
contains everything, somewhere).

>

>

>

I wrote some draft emails with the resulting ideas but eventually just fed
them to Gemini to try to get it to roll them into something coherent.

>

>

>

Gemini goes off the rails a bit in places and, unsurprisingly, there’s a
fair amount of questionable BS. I pushed back a little at the end. Overall,
I think it did a reasonable job of turning my input into something less
opaque.

>

>

>

At least, it serves as an example of the need for a significant amount of
work after abductive inference to determine whether you have anything of
value.

>

>

>

<https://share.google/aimode/lVcN4fqShhzve2PCz>

>

>

>

Harry

>

>

>

On Thu, 11 Dec 2025 at 15:18, Harry Butterworth
<heb1001@gmail.com> wrote:

>

I think the main thing missing from historical AI approaches in general is
the ability to create higher level abstractions from observations.

>

>

>

IMO, creating higher level abstractions is a significant part of what
maths is about: observe a pattern, make a formal statement of the pattern,
search for a proof to determine if the pattern always holds.

>

>

>

If you consider that mathematical expressions and statements are
enumerable then the interesting proofs are those that appear on the shortest
paths linking nodes of the tree of the enumeration where proofs linking nodes
closer to the root of the tree are more interesting and useful (because they
are more generic) than proofs linking nodes at more remote levels (where they
are more specific).

>

>

>

Taking Euler’s identity as an example: it’s a statement that sits near the
root of the tree (because it’s short) that links expressions that also sit
near the root of the tree (because they are short too) and it’s an abstraction
— a shortcut — for its own proof which, in contrast, is a more convoluted path
that climbs high up into the branches of the tree before coming back down
again.

>

>

>

The “noise” about LLMs has arisen because they are demonstrating the
ability to create higher level abstractions from observations (the training
data) during self-supervised learning without requiring explicit human input.
This is abductive inference and it’s something that historical approaches
can’t do (AFAIK).

>

>

>

LLMs are not necessarily the best architecture for abductive inference and
the scaffolding around current LLMs is not necessarily particularly
appropriate for doing mathematics but LLMs do represent a step change in the
right direction in terms of capability.

>

>

>

Undecidability and Godel’s theorems do not restrict computers from
becoming superhuman at mathematics (even if you believe in some kind of
substrate dependence, which I don’t, it is possible to create biological
computers).

>

>

>

Another way of looking at Godel’s theorems is that every undecidable
proposition creates two new branches within mathematics and that this
branching tree is proved to be infinite. You wouldn’t want it any other way
as there is an isomorphism between reality and a mathematical description of
reality and who would choose to live in a finite universe?

>

>

>

In fact, if you subscribe to the simulation hypothesis (and
consequentially the dual aspect theory of information), the entire universe
may simply be the internal perspective of a theorem prover currently working
on a statement which is isomorphic to me winning this argument about the
significance of LLMs. A statement somewhere almost inconceivably high up in
the branches of highly specific and asymptotically useless mathematics.

>

>

>

Enjoy!

>

>

>

On Thu, 11 Dec 2025 at 12:22, Yakoub Nemouchi
<y.nemouchi@gmail.com> wrote:

>

Josef -

>

>

>

Just for clarification. I am not questioning the value of Jonathan’s work
or others work.

>

>

>

I am just trying to create some awareness for CS community in the middle
of this LLMs noise.

>

>

>

Best,

>

Yakoub.

>

>

>

On Wed, Dec 10, 2025 at 1:10 PM Josef Urban
<josef.urban@gmail.com> wrote:

>

I said recently in a talk that the ATP and hammer algorithms were
"generative AI" long before LLMs appeared [1]. For someone using a hammer (or
an efficient tactical prover like TacticToe/Tactician) routinely to get
correct proofs, "Chatgpt as a prover" felt like an expensive joke when it
first appeared. (I have started to call Chatgpt a "gateway drug to ML" :).

>

>

>

I would hope the "magic of scaling up" hype is mostly over and even the
LLM/DL community has recently (e.g. with the Deepseek paper) acknowledged that
efficient search/reasoning and combinations of algorithms matter a lot (cf
"agentic AI" as the next emerging buzzword).

>

>

>

Projects like Jonathan's should allow people in academia to eventually
rigorously evaluate which (combinations of) algorithms and architectures work
better than others. There has been plenty of decent and interesting/innovative
work done continuously by the AITP community in this direction (e.g. [2,3]
pointing to nice non-Transformer architectures). Hopefully, some of the "super
excited" mathematicians will eventually realize that AI/TP is also a science
;-).

>

>

>

Josef

>

>

>

[1] <https://github.com/JUrban/LMvsATP>

>

[2] <https://arxiv.org/abs/2401.02949>

>

[3] <https://arxiv.org/abs/2503.07792>

>

>

>

On Wed, Dec 10, 2025, 7:47 PM Lawrence Paulson
<lp15@cam.ac.uk> wrote:

>

Sledgehammer run easily on a laptop and its energy consumption is
essentially zero compared with the amount of energy consumed to train GPT
agents.

>

>

>

Larry

>

On 10 Dec 2025 at 03:01 +0000, Yakoub Nemouchi
<y.nemouchi@gmail.com>, wrote:

>

It would be great if someone holding a position at academia does a
study specifically on ITPs. Comparing the amount of energy needed for
Sledgehammer to generate a proof. And the amount of energy consumed to train
GPT agents and prompting them to generate the same proof. And study the
impact of energy consumption from prompting and training GPT agents on
societies, communities and environment.

>

>

>

Best wishes,

>

>

>

Yakoub.

>

On Tue, Dec 9, 2025 at 10:06 AM Jonathan Julián Huerta y Munive
<jonjulian23@gmail.com> wrote:

>

Hi Benjamin,

>

>

>

You might want to try the resources in the DeepIsaHOL project.

>

Currently it has support for ChatGPT, Gemini, Ollama, and HuggingFace
Transformers.

>

It is very much in a prototypical state and will be upgraded in the
future to adhere to Isabelle standards. However, for the demonstration
purposes you mention, it should suffice. For instance, here is a video showing ChatGPT proving simple theorems
in Isabelle. The result is still interactive theorem proving, but the
DeepIsaHOL project has support for automating the
manual parts at the end of the video.

>

>

>

Best wishes,

>

Jonathan

>

>

>

\------------------------------
Message-ID: <[3ee6efa75b672760cd3484fe2ca903abafddfdd5.camel@tu-
b
[message truncated]
preview.png

Last updated: May 23 2026 at 03:31 UTC