Stream: General

Topic: Artificial intelligence for Isabelle


view this post on Zulip Craig Alan Feinstein (May 25 2025 at 18:52):

Has anyone tried using artificial intelligence like chatgpt to save time writing proofs in Isabelle. I have tried it with ChatGpt and Claude where you pay $20 a month. Has anyone tried any other AI tools?

view this post on Zulip Craig Alan Feinstein (May 25 2025 at 19:34):

I just found this article saying Baldur works well with Isabelle. https://spectrum.ieee.org/amp/ai-debug-software-2667018624

view this post on Zulip Mathias Fleury (May 25 2025 at 19:43):

I have never seen a demo of the system showing that it is working for the end-user

view this post on Zulip Craig Alan Feinstein (May 25 2025 at 23:23):

I didn’t see it either.

view this post on Zulip Yutaka Nagashima (May 26 2025 at 17:09):

The AbductionProver for Isabelle is currently more of a search-based approach, but I still call it an AI prover.
(Also, it runs on a laptop without requiring expensive graphics cards.)

https://youtu.be/rXU-lJxP_GI

https://github.com/data61/PSL

view this post on Zulip Craig Alan Feinstein (May 26 2025 at 20:37):

Thank you Yutaka I will check that out. I was about to give up on Claude yesterday after having discovered that if you stay on the same conversation for a long enough time, it appears to run out of intelligent things to say with respect to your code. However, I discovered that if you transfer to another conversation, it will come up with more intelligent comments about the code.

view this post on Zulip Yutaka Nagashima (May 26 2025 at 21:39):

I haven't tried Claude 4 yet.

However, I discovered that if you transfer to another conversation, it will come up with more intelligent comments about the code.

Overall, this seems like an issue someone else will automatically solve after n months or weeks—if they haven't already. We just need to wait while focusing on something else—unless you're really keen to tackle it yourself. :stuck_out_tongue_wink:

view this post on Zulip Craig Alan Feinstein (May 26 2025 at 23:33):

Yutaka I’m in a spot where I would like to finish doing my proofs I’ve been working on for a long time. At this point if I can do this using ai I’ll use it.

view this post on Zulip Craig Alan Feinstein (May 28 2025 at 13:57):

I figure ai is in many ways like a human. If you talk to much to it, it gets tired and eventually it gets so exhausted that it forgets a lot of what has been said and starts saying stupid stuff

view this post on Zulip Craig Alan Feinstein (Aug 17 2025 at 06:30):

I’ve been experimenting with ChatGPT 5. It is much better than ChatGPT 4 but still has a way to go with Isabelle proofs. It gives code with lots of errors a significant amount of the time. And when I asked it to it to fix the errors I got more errors much of the time. It is definitely improving though. I do think it will eventually hit a point where it will be successful with high probability.

view this post on Zulip Craig Alan Feinstein (Aug 17 2025 at 20:59):

I’m finding that sledgehammer is a good bet whenever it doesn’t work

view this post on Zulip irvin (Aug 18 2025 at 14:23):

I've never got a LLM to generate Isabelle/ML code that works.

view this post on Zulip Craig Alan Feinstein (Aug 25 2025 at 01:53):

The code that ChatGPT 5 generates usually won’t work on the first try but if you work with it, it will be able to eliminate errors in many cases, but you have to be persistent. Also it appears to me that proofs that use algebra and number theory are harder for ChatGPT than just plain logic. That’s what I’ve found since it has come out.

view this post on Zulip Craig Alan Feinstein (Dec 21 2025 at 16:48):

ChatGPT 5.2 is out and it is considerably better than 5.0 for Isabelle. But there is still room for improvement.

view this post on Zulip Mario Xerxes Castelán Castro (Dec 22 2025 at 03:52):

:brain: is out and it is considerably better than LLMs but there is still room for improvement.

view this post on Zulip Craig Alan Feinstein (Dec 28 2025 at 02:22):

Are you comparing human brains with LLMs? I think human brains have already lost based on my experience with ChatGPT 5.2. At this point human brains are only valuable as a way of checking when LLMs get it wrong.

view this post on Zulip Mario Xerxes Castelán Castro (Dec 28 2025 at 02:55):

Craig Alan Feinstein ha dicho:

At this point human brains are only valuable as a way of checking when LLMs get it wrong.

It sounds like that is the case for your brain and you incorrectly generalized that to everybody.

view this post on Zulip Mario Xerxes Castelán Castro (Dec 28 2025 at 03:20):

You may be misled by using LLMs to prove theorems similar to what is in the training set. In such a case, of course LLMs are going to perform good. They apply the same pattern they memorized. When confronted with novel problems they perform mediocre to useless. IMO problems are not especially novel and only involve concepts from high school mathematics, yet LLMs still perform bad <https://matharena.ai/?view=problem&comp=imo--imo_2025> (not mechanized).

view this post on Zulip Craig Alan Feinstein (Dec 28 2025 at 07:20):

You are right that they might not be imaginative enough to solve really hard problems, but other than that they are great

view this post on Zulip Craig Alan Feinstein (Dec 28 2025 at 18:54):

AI helped me finish three Isabelle projects and I sent them to the Archive of Formal Proofs. They are on GitHub on cafeinst. I wouldn’t have been able to finish them without ai. I’m still trying to finish the bruck ryser chowla theorem though.

view this post on Zulip Craig Alan Feinstein (Dec 28 2025 at 18:55):

I think Isabelle plus AI is the future of math.

view this post on Zulip Mario Xerxes Castelán Castro (Dec 29 2025 at 00:19):

I get that LLMs can be useful for routine proofs. I use them sometimes as enhanced search engines but not to do my thinking.
However, I do not see how LLMs would become better at formalizing mathematics than a mathematician knowledgeable in Isabelle (or another ITP) and the specific area involved. LLMs have a lot of trouble at reasoning consistently (without hallucinating).

view this post on Zulip Mario Xerxes Castelán Castro (Dec 29 2025 at 00:24):

When using AI tools, beware of the possibility of becoming the bottom end of a "reverse centaur":

A centaur is a human being who is assisted by a machine that does some onerous task (like transcribing 40 hours of podcasts). A reverse-centaur is a machine that is assisted by a human being, who is expected to work at the machine’s pace.

(source: <https://locusmag.com/feature/commentary-cory-doctorow-reverse-centaurs/>)
This is especially a risk when using AI to do your thinking.

view this post on Zulip Craig Alan Feinstein (Dec 29 2025 at 00:31):

That article was written at the beginning of September. AI has improved greatly since then.

view this post on Zulip Craig Alan Feinstein (Dec 29 2025 at 01:02):

For instance I cannot even remember what hallucinations are like.

view this post on Zulip Mario Xerxes Castelán Castro (Dec 29 2025 at 01:25):

Craig Alan Feinstein ha dicho:

That article was written at the beginning of September. AI has improved greatly since then.

That article is a source just for the concept of "reverse centaur". I get hallucinations from LLMs daily. I do not see that supposed great improvement.

view this post on Zulip Mario Xerxes Castelán Castro (Dec 29 2025 at 01:27):

My point is that regardless of whether AI is good or bad at the task, if you offload a significant amount of your thinking to it, you are becoming an aide to the machine instead of the other way.

view this post on Zulip Craig Alan Feinstein (Dec 29 2025 at 13:52):

That is true.


Last updated: Feb 06 2026 at 20:37 UTC