Looking for proofs of false, Isabelle edition · General

Stream: General

Topic: Looking for proofs of false, Isabelle edition

Tristan Stérin (Mar 11 2026 at 16:28):

Hello,

Last week my Opus 4.6 setup found proofs of false in Rocq and Lean (see blog post).

@Lukas Stevens sent me overview.md describing areas of suspected vulnerabilities in Isabelle, and I've asked Opus to find bugs and potentially exploit them into proofs of false in the Isabelle codebase.

Here is what it found so far: REPORT.md (Using Isabelle2025-2)

Section 1 can be ignored as this was already known (KuncarPopescu_Exploit.thy) but there may be interesting bugs in Sections 3 and 2.

I still have access to the agent, let me know if it should be steered in one direction or another :)

Dmitriy Traytel (Mar 11 2026 at 16:47):

Thanks for looking into this. 3.3 could be a one place where it might be worth to try harder. If dependencies really can be lost, then potentially the KuncarPopescu example can be resurrected. If theory merges are to blame, than any counterexample would need to consist of multiple theories (probably with a diamond-shaped import structure).

Tristan Stérin (Mar 11 2026 at 16:59):

With pleasure! OK I will steer it in that direction (currently stucked on an anthropic login bug)

Tristan Stérin (Mar 11 2026 at 19:55):

Capture d’écran 2026-03-11 à 20.54.15.png

Here is its updated report: REPORT_UPDATED.md it may contain new leads

Dmitriy Traytel (Mar 12 2026 at 07:45):

Good for the merges! The only place I am aware of where unchecked definitions are explicitly used is Nominal1 (src/HOL/Nominal/Nominal.thy). (In contrast Nominal2 in the AFP is fully foundational.) It might makes sense to try to make sense of these particular unchecked definitions (and the checked ones building on top) and see if they can give rise to any problems.

Tristan Stérin (Mar 12 2026 at 10:34):

I forwarded your message and it updated its report but does not seem to have found anything new:
Capture d’écran 2026-03-12 à 11.32.44.png
REPORT.md

Lukas Stevens (Mar 13 2026 at 16:44):

Hi Tristan,
thanks for all your efforts! We from the Copilots for Isabelle project are currently out of ideas on what other details to look out for. If time and tokens permit, you could try to have a go with less specific input to avoid "overfitting", so you might just point it to the primitives in thm.ML, overloading and typedef.

Tristan Stérin (Mar 13 2026 at 16:52):

Yes of course! Launching now

Tristan Stérin (Mar 13 2026 at 17:13):

It did not try very hard but maybe enough that there is clear direction to nudge it into?

Lukas Stevens (Mar 17 2026 at 12:21):

Yes, that doesn't look very promising. I don't have any specific ideas. I am wondering whether it generated any code to test these claims, or if they just come from considering the implementation?

Tristan Stérin (Mar 17 2026 at 12:32):

It generated 28 test files:
isabelle_tests_archive.tar.gz

Last updated: Jul 02 2026 at 14:06 UTC