Has anyone tried using artificial intelligence like chatgpt to save time writing proofs in Isabelle. I have tried it with ChatGpt and Claude where you pay $20 a month. Has anyone tried any other AI tools?
I just found this article saying Baldur works well with Isabelle. https://spectrum.ieee.org/amp/ai-debug-software-2667018624
I have never seen a demo of the system showing that it is working for the end-user
I didn’t see it either.
The AbductionProver for Isabelle is currently more of a search-based approach, but I still call it an AI prover.
(Also, it runs on a laptop without requiring expensive graphics cards.)
Thank you Yutaka I will check that out. I was about to give up on Claude yesterday after having discovered that if you stay on the same conversation for a long enough time, it appears to run out of intelligent things to say with respect to your code. However, I discovered that if you transfer to another conversation, it will come up with more intelligent comments about the code.
I haven't tried Claude 4 yet.
However, I discovered that if you transfer to another conversation, it will come up with more intelligent comments about the code.
Overall, this seems like an issue someone else will automatically solve after n months or weeks—if they haven't already. We just need to wait while focusing on something else—unless you're really keen to tackle it yourself. :stuck_out_tongue_wink:
Yutaka I’m in a spot where I would like to finish doing my proofs I’ve been working on for a long time. At this point if I can do this using ai I’ll use it.
I figure ai is in many ways like a human. If you talk to much to it, it gets tired and eventually it gets so exhausted that it forgets a lot of what has been said and starts saying stupid stuff
I’ve been experimenting with ChatGPT 5. It is much better than ChatGPT 4 but still has a way to go with Isabelle proofs. It gives code with lots of errors a significant amount of the time. And when I asked it to it to fix the errors I got more errors much of the time. It is definitely improving though. I do think it will eventually hit a point where it will be successful with high probability.
I’m finding that sledgehammer is a good bet whenever it doesn’t work
I've never got a LLM to generate Isabelle/ML code that works.
The code that ChatGPT 5 generates usually won’t work on the first try but if you work with it, it will be able to eliminate errors in many cases, but you have to be persistent. Also it appears to me that proofs that use algebra and number theory are harder for ChatGPT than just plain logic. That’s what I’ve found since it has come out.
ChatGPT 5.2 is out and it is considerably better than 5.0 for Isabelle. But there is still room for improvement.
:brain: is out and it is considerably better than LLMs but there is still room for improvement.
Are you comparing human brains with LLMs? I think human brains have already lost based on my experience with ChatGPT 5.2. At this point human brains are only valuable as a way of checking when LLMs get it wrong.
Craig Alan Feinstein ha dicho:
At this point human brains are only valuable as a way of checking when LLMs get it wrong.
It sounds like that is the case for your brain and you incorrectly generalized that to everybody.
You may be misled by using LLMs to prove theorems similar to what is in the training set. In such a case, of course LLMs are going to perform good. They apply the same pattern they memorized. When confronted with novel problems they perform mediocre to useless. IMO problems are not especially novel and only involve concepts from high school mathematics, yet LLMs still perform bad <https://matharena.ai/?view=problem&comp=imo--imo_2025> (not mechanized).
You are right that they might not be imaginative enough to solve really hard problems, but other than that they are great
AI helped me finish three Isabelle projects and I sent them to the Archive of Formal Proofs. They are on GitHub on cafeinst. I wouldn’t have been able to finish them without ai. I’m still trying to finish the bruck ryser chowla theorem though.
I think Isabelle plus AI is the future of math.
I get that LLMs can be useful for routine proofs. I use them sometimes as enhanced search engines but not to do my thinking.
However, I do not see how LLMs would become better at formalizing mathematics than a mathematician knowledgeable in Isabelle (or another ITP) and the specific area involved. LLMs have a lot of trouble at reasoning consistently (without hallucinating).
When using AI tools, beware of the possibility of becoming the bottom end of a "reverse centaur":
A centaur is a human being who is assisted by a machine that does some onerous task (like transcribing 40 hours of podcasts). A reverse-centaur is a machine that is assisted by a human being, who is expected to work at the machine’s pace.
(source: <https://locusmag.com/feature/commentary-cory-doctorow-reverse-centaurs/>)
This is especially a risk when using AI to do your thinking.
That article was written at the beginning of September. AI has improved greatly since then.
For instance I cannot even remember what hallucinations are like.
Craig Alan Feinstein ha dicho:
That article was written at the beginning of September. AI has improved greatly since then.
That article is a source just for the concept of "reverse centaur". I get hallucinations from LLMs daily. I do not see that supposed great improvement.
My point is that regardless of whether AI is good or bad at the task, if you offload a significant amount of your thinking to it, you are becoming an aide to the machine instead of the other way.
That is true.
Last updated: Feb 06 2026 at 20:37 UTC