Has anyone tried using artificial intelligence like chatgpt to save time writing proofs in Isabelle. I have tried it with ChatGpt and Claude where you pay $20 a month. Has anyone tried any other AI tools?
I just found this article saying Baldur works well with Isabelle. https://spectrum.ieee.org/amp/ai-debug-software-2667018624
I have never seen a demo of the system showing that it is working for the end-user
I didn’t see it either.
The AbductionProver for Isabelle is currently more of a search-based approach, but I still call it an AI prover.
(Also, it runs on a laptop without requiring expensive graphics cards.)
Thank you Yutaka I will check that out. I was about to give up on Claude yesterday after having discovered that if you stay on the same conversation for a long enough time, it appears to run out of intelligent things to say with respect to your code. However, I discovered that if you transfer to another conversation, it will come up with more intelligent comments about the code.
I haven't tried Claude 4 yet.
However, I discovered that if you transfer to another conversation, it will come up with more intelligent comments about the code.
Overall, this seems like an issue someone else will automatically solve after n months or weeks—if they haven't already. We just need to wait while focusing on something else—unless you're really keen to tackle it yourself. :stuck_out_tongue_wink:
Yutaka I’m in a spot where I would like to finish doing my proofs I’ve been working on for a long time. At this point if I can do this using ai I’ll use it.
I figure ai is in many ways like a human. If you talk to much to it, it gets tired and eventually it gets so exhausted that it forgets a lot of what has been said and starts saying stupid stuff
I’ve been experimenting with ChatGPT 5. It is much better than ChatGPT 4 but still has a way to go with Isabelle proofs. It gives code with lots of errors a significant amount of the time. And when I asked it to it to fix the errors I got more errors much of the time. It is definitely improving though. I do think it will eventually hit a point where it will be successful with high probability.
I’m finding that sledgehammer is a good bet whenever it doesn’t work
I've never got a LLM to generate Isabelle/ML
code that works.
Last updated: Aug 23 2025 at 01:39 UTC