Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] DeepSeek experiment


view this post on Zulip Email Gateway (Jul 10 2025 at 10:42):

From: Lawrence Paulson <lp15@cam.ac.uk>
Hello all, you may be interested in a little experiment I tried using the DeepSeek LLM yesterday. I asked whether it could translate a theorem header from n-dimensional real vectors to abstract Euclidean spaces. My query:

The following theorem header fixes variable "a" to type "real^'n". Changing this type to "'a::euclidean_space" requires modifying the header, replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost right but clunky and referred to something non-standard. Pushing it a little further, I got the following, which is far as I can tell is correct. (Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more detail? Or would you like assistance with proving this version of the theorem?

My impression is that it was grabbing material from a variety of places, in particular the AFP entry Ordinary_Differential_Equations. But the word "elegant" doesn't appear there and in fact appears in the entire distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better part of an hour to get it right. I didn't try to port the proof (which was long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases and summarise them. These tools may already be useful for tasks such as identifying duplicate material.

Larry

view this post on Zulip Email Gateway (Jul 14 2025 at 09:47):

From: Tobias Nipkow <nipkow@in.tum.de>
Larry,

I recently asked chatGPT if it could prove that $f(x) = x * ln (1 + 1/x)$ is
monotone, but the answer was not helpful. How is DeepSeek doing on this one?
(I think it can be proved easily via an alternating series for ln (1 + x)).

Tobias

On 10/07/2025 11:42, Lawrence Paulson wrote:

Hello all, you may be interested in a little experiment I tried using the DeepSeek LLM yesterday. I asked whether it could translate a theorem header from n-dimensional real vectors to abstract Euclidean spaces. My query:

The following theorem header fixes variable "a" to type "real^'n". Changing this type to "'a::euclidean_space" requires modifying the header, replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost right but clunky and referred to something non-standard. Pushing it a little further, I got the following, which is far as I can tell is correct. (Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more detail? Or would you like assistance with proving this version of the theorem?

My impression is that it was grabbing material from a variety of places, in particular the AFP entry Ordinary_Differential_Equations. But the word "elegant" doesn't appear there and in fact appears in the entire distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better part of an hour to get it right. I didn't try to port the proof (which was long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases and summarise them. These tools may already be useful for tasks such as identifying duplicate material.

Larry

smime.p7s

view this post on Zulip Email Gateway (Jul 14 2025 at 16:54):

From: Jason Gross <jasongross9@gmail.com>
ChatGPT o3 seems able to prove that f(x) is monotone with a fairly direct
argument that the first derivative is positive:
https://chatgpt.com/share/68753563-d5a0-8007-aeea-4f48a3b55eb5

Cheers,
Jason

On Mon, Jul 14, 2025, 02:51 Tobias Nipkow <nipkow@in.tum.de> wrote:

Larry,

I recently asked chatGPT if it could prove that $f(x) = x * ln (1 + 1/x)$
is
monotone, but the answer was not helpful. How is DeepSeek doing on this
one?
(I think it can be proved easily via an alternating series for ln (1 + x)).

Tobias

On 10/07/2025 11:42, Lawrence Paulson wrote:

Hello all, you may be interested in a little experiment I tried using
the DeepSeek LLM yesterday. I asked whether it could translate a theorem
header from n-dimensional real vectors to abstract Euclidean spaces. My
query:

The following theorem header fixes variable "a" to type "real^'n".
Changing this type to "'a::euclidean_space" requires modifying the header,
replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n
else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost
right but clunky and referred to something non-standard. Pushing it a
little further, I got the following, which is far as I can tell is correct.
(Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m
then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding
narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular
basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more
detail? Or would you like assistance with proving this version of the
theorem?

My impression is that it was grabbing material from a variety of places,
in particular the AFP entry Ordinary_Differential_Equations. But the word
"elegant" doesn't appear there and in fact appears in the entire
distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better
part of an hour to get it right. I didn't try to port the proof (which was
long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of
DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle
codebases and summarise them. These tools may already be useful for tasks
such as identifying duplicate material.

Larry

view this post on Zulip Email Gateway (Jul 14 2025 at 17:53):

From: Tobias Nipkow <nipkow@in.tum.de>
Interesting. This is very different from what I got about f'. Haven't checked
"your" proof ...

Tobias

On 14/07/2025 17:53, Jason Gross wrote:

ChatGPT o3 seems able to prove that f(x) is monotone with a fairly direct
argument that the first derivative is positive: https://chatgpt.com/
share/68753563-d5a0-8007-aeea-4f48a3b55eb5 <https://chatgpt.com/share/68753563-
d5a0-8007-aeea-4f48a3b55eb5>

Cheers,
Jason

On Mon, Jul 14, 2025, 02:51 Tobias Nipkow <nipkow@in.tum.de
<mailto:nipkow@in.tum.de>> wrote:

Larry,

I recently asked chatGPT if it could prove that $f(x) = x * ln (1 + 1/x)$ is
monotone, but the answer was not helpful. How is DeepSeek doing on this one?
(I think it can be proved easily via an alternating series for ln (1 + x)).

Tobias

On 10/07/2025 11:42, Lawrence Paulson wrote:
> Hello all, you may be interested in a little experiment I tried using the
DeepSeek LLM yesterday. I asked whether it could translate a theorem header
from n-dimensional real vectors to abstract Euclidean spaces. My query:
>
>> The following theorem header fixes variable "a" to type "real^'n".
Changing this type to "'a::euclidean_space" requires modifying the header,
replacing $ and χ by their analogues for Euclidean spaces.
>>
>> proposition
>>    fixes a :: "real^'n"
>>    assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
>>    shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n
else x$i) (cbox a b) ∈ lmeasurable" >>     and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
>>                 = measure lebesgue (cbox a b)"
>
> On its first attempt it produced something that seemed to be almost right
but clunky and referred to something non-standard. Pushing it a little
further, I got the following, which is far as I can tell is correct.
(Isabelle accepts it.)
>
>> proposition
>>    fixes a :: "'a::euclidean_space"
>>    assumes "m ≠ n"      and "m ∈ Basis" and "n ∈ Basis"
>>      and ab_ne: "cbox a b ≠ {}"      and an: "0 ≤ a ∙ n"
>>    shows measurable_shear_interval:          "(λx. ∑i∈Basis. (if i = m
then (x ∙ i) + (x ∙ n)                            else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" >>     and measure_shear_interval:          "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n)         else (x ∙ i)) *⇩R i) cbox a b)
>>           = measure lebesgue (cbox a b)”
>
>
> Everything was accompanied by quantities of plausible sounding narrative,
e.g. "this version is more abstract and elegant”. A sample:
>
>> This version is more type-safe because:
>>      • No need to worry about index bounds
>>      • The basis vectors are first-class entities
>>      • The notation more clearly reflects the mathematical intent
>> Note that in this formulation:
>>      • Basis is the canonical basis of the Euclidean space
>>      • ∙ is the inner product (giving the component in a particular
basis direction)
>>      • *⇩R is the scalar multiplication
>>      • The sum reconstructs the vector from its components
>> Would you like me to explain any part of this formulation in more
detail? Or would you like assistance with proving this version of the theorem?
>
> My impression is that it was grabbing material from a variety of places,
in particular the AFP entry Ordinary_Differential_Equations. But the word
"elegant" doesn't appear there and in fact appears in the entire
distribution + AFP only 42 times.
>
> I recall doing the precise task above in the past and needing the better
part of an hour to get it right. I didn't try to port the proof (which was
long and complicated) and I would be surprised if it could get that right.
>
> I'm not sure how much more can be accomplished with the free version of
DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases
and summarise them. These tools may already be useful for tasks such as
identifying duplicate material.
>
> Larry
>

smime.p7s

view this post on Zulip Email Gateway (Jul 25 2025 at 17:54):

From: "\"Vromen, H.J. (Huub)\"" <cl-isabelle-users@lists.cam.ac.uk>
Tobias,

I submitted the question of the monotonicity of the function f(x) = x * ln (1 + 1/x) to Deepseek. To that, I received a particularly detailed answer with many explanations of the thinking steps. Since Deepseek does not have the ability to share a conversation, I have put the answer in this PDF document.
Huub


Last updated: Jul 26 2025 at 12:43 UTC