Stream: Mirror: Isabelle Development Mailing List

Topic: [isabelle] DeepSeek experiment


view this post on Zulip Email Gateway (Jul 10 2025 at 21:50):

From: Humam Alhusaini via isabelle-dev <isabelle-dev@mailman.proof.cit.tum.de>

I understand that the use of LLMs for formal proofs is an exciting and fascinating prospect, but it should be kept in mind that the majority of these LLMs suck up information from anywhere, without regard for licensing. This is why the Isabelle dev blog is restricted to accounts. I'd recommend that the maintainers revisit Anubis, as it is not as "nuclear" as restricting the dev blog to accounts.

Jul 10, 2025, 05:42 by lp15@cam.ac.uk:

Hello all, you may be interested in a little experiment I tried using the DeepSeek LLM yesterday. I asked whether it could translate a theorem header from n-dimensional real vectors to abstract Euclidean spaces. My query:

The following theorem header fixes variable "a" to type "real^'n". Changing this type to "'a::euclidean_space" requires modifying the header, replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost right but clunky and referred to something non-standard. Pushing it a little further, I got the following, which is far as I can tell is correct. (Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more detail? Or would you like assistance with proving this version of the theorem?

My impression is that it was grabbing material from a variety of places, in particular the AFP entry Ordinary_Differential_Equations. But the word "elegant" doesn't appear there and in fact appears in the entire distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better part of an hour to get it right. I didn't try to port the proof (which was long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases and summarise them. These tools may already be useful for tasks such as identifying duplicate material.

Larry


Last updated: Jul 12 2025 at 16:25 UTC