[isabelle] DeepSeek experiment · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] DeepSeek experiment

Email Gateway (Jul 10 2025 at 10:42):

From: Lawrence Paulson <lp15@cam.ac.uk>
Hello all, you may be interested in a little experiment I tried using the DeepSeek LLM yesterday. I asked whether it could translate a theorem header from n-dimensional real vectors to abstract Euclidean spaces. My query:

The following theorem header fixes variable "a" to type "real^'n". Changing this type to "'a::euclidean_space" requires modifying the header, replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost right but clunky and referred to something non-standard. Pushing it a little further, I got the following, which is far as I can tell is correct. (Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more detail? Or would you like assistance with proving this version of the theorem?

My impression is that it was grabbing material from a variety of places, in particular the AFP entry Ordinary_Differential_Equations. But the word "elegant" doesn't appear there and in fact appears in the entire distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better part of an hour to get it right. I didn't try to port the proof (which was long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases and summarise them. These tools may already be useful for tasks such as identifying duplicate material.

Larry

Email Gateway (Jul 14 2025 at 09:47):

From: Tobias Nipkow <nipkow@in.tum.de>
Larry,

I recently asked chatGPT if it could prove that $f(x) = x * ln (1 + 1/x)$ is
monotone, but the answer was not helpful. How is DeepSeek doing on this one?
(I think it can be proved easily via an alternating series for ln (1 + x)).

Tobias

On 10/07/2025 11:42, Lawrence Paulson wrote:

Hello all, you may be interested in a little experiment I tried using the DeepSeek LLM yesterday. I asked whether it could translate a theorem header from n-dimensional real vectors to abstract Euclidean spaces. My query:

The following theorem header fixes variable "a" to type "real^'n". Changing this type to "'a::euclidean_space" requires modifying the header, replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost right but clunky and referred to something non-standard. Pushing it a little further, I got the following, which is far as I can tell is correct. (Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more detail? Or would you like assistance with proving this version of the theorem?

My impression is that it was grabbing material from a variety of places, in particular the AFP entry Ordinary_Differential_Equations. But the word "elegant" doesn't appear there and in fact appears in the entire distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better part of an hour to get it right. I didn't try to port the proof (which was long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases and summarise them. These tools may already be useful for tasks such as identifying duplicate material.

Larry

smime.p7s

Email Gateway (Jul 14 2025 at 16:54):

From: Jason Gross <jasongross9@gmail.com>
ChatGPT o3 seems able to prove that f(x) is monotone with a fairly direct
argument that the first derivative is positive:
https://chatgpt.com/share/68753563-d5a0-8007-aeea-4f48a3b55eb5

Cheers,
Jason

On Mon, Jul 14, 2025, 02:51 Tobias Nipkow <nipkow@in.tum.de> wrote:

Larry,

I recently asked chatGPT if it could prove that $f(x) = x * ln (1 + 1/x)$
is
monotone, but the answer was not helpful. How is DeepSeek doing on this
one?
(I think it can be proved easily via an alternating series for ln (1 + x)).

Tobias

On 10/07/2025 11:42, Lawrence Paulson wrote:

Hello all, you may be interested in a little experiment I tried using
the DeepSeek LLM yesterday. I asked whether it could translate a theorem
header from n-dimensional real vectors to abstract Euclidean spaces. My
query:

The following theorem header fixes variable "a" to type "real^'n".
Changing this type to "'a::euclidean_space" requires modifying the header,
replacing $ and χ by their analogues for Euclidean spaces.

proposition
fixes a :: "real^'n"
assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n
else x$i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
= measure lebesgue (cbox a b)"

On its first attempt it produced something that seemed to be almost
right but clunky and referred to something non-standard. Pushing it a
little further, I got the following, which is far as I can tell is correct.
(Isabelle accepts it.)

proposition
fixes a :: "'a::euclidean_space"
assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m
then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
= measure lebesgue (cbox a b)”

Everything was accompanied by quantities of plausible sounding
narrative, e.g. "this version is more abstract and elegant”. A sample:

This version is more type-safe because:
• No need to worry about index bounds
• The basis vectors are first-class entities
• The notation more clearly reflects the mathematical intent
Note that in this formulation:
• Basis is the canonical basis of the Euclidean space
• ∙ is the inner product (giving the component in a particular
basis direction)
• *⇩R is the scalar multiplication
• The sum reconstructs the vector from its components
Would you like me to explain any part of this formulation in more
detail? Or would you like assistance with proving this version of the
theorem?

My impression is that it was grabbing material from a variety of places,
in particular the AFP entry Ordinary_Differential_Equations. But the word
"elegant" doesn't appear there and in fact appears in the entire
distribution + AFP only 42 times.

I recall doing the precise task above in the past and needing the better
part of an hour to get it right. I didn't try to port the proof (which was
long and complicated) and I would be surprised if it could get that right.

I'm not sure how much more can be accomplished with the free version of
DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle
codebases and summarise them. These tools may already be useful for tasks
such as identifying duplicate material.

Larry

Email Gateway (Jul 14 2025 at 17:53):

From: Tobias Nipkow <nipkow@in.tum.de>
Interesting. This is very different from what I got about f'. Haven't checked
"your" proof ...

Tobias

On 14/07/2025 17:53, Jason Gross wrote:

ChatGPT o3 seems able to prove that f(x) is monotone with a fairly direct
argument that the first derivative is positive: https://chatgpt.com/
share/68753563-d5a0-8007-aeea-4f48a3b55eb5 <https://chatgpt.com/share/68753563-
d5a0-8007-aeea-4f48a3b55eb5>

Cheers,
Jason

On Mon, Jul 14, 2025, 02:51 Tobias Nipkow <nipkow@in.tum.de
<mailto:nipkow@in.tum.de>> wrote:

Larry,

I recently asked chatGPT if it could prove that $f(x) = x * ln (1 + 1/x)$ is
monotone, but the answer was not helpful. How is DeepSeek doing on this one?
(I think it can be proved easily via an alternating series for ln (1 + x)).

Tobias

On 10/07/2025 11:42, Lawrence Paulson wrote:
> Hello all, you may be interested in a little experiment I tried using the
DeepSeek LLM yesterday. I asked whether it could translate a theorem header
from n-dimensional real vectors to abstract Euclidean spaces. My query:
>
>> The following theorem header fixes variable "a" to type "real^'n".
Changing this type to "'a::euclidean_space" requires modifying the header,
replacing $ and χ by their analogues for Euclidean spaces.
>>
>> proposition
>> fixes a :: "real^'n"
>> assumes "m ≠ n" and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a$n"
>> shows measurable_shear_interval: "(λx. χ i. if i = m then x$m + x$n
else x$i) (cbox a b) ∈ lmeasurable" >> and measure_shear_interval: "measure lebesgue ((λx. χ i. if i = m then x$m + x$n else x$i) cbox a b)
>> = measure lebesgue (cbox a b)"
>
> On its first attempt it produced something that seemed to be almost right
but clunky and referred to something non-standard. Pushing it a little
further, I got the following, which is far as I can tell is correct.
(Isabelle accepts it.)
>
>> proposition
>> fixes a :: "'a::euclidean_space"
>> assumes "m ≠ n" and "m ∈ Basis" and "n ∈ Basis"
>> and ab_ne: "cbox a b ≠ {}" and an: "0 ≤ a ∙ n"
>> shows measurable_shear_interval: "(λx. ∑i∈Basis. (if i = m
then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) (cbox a b) ∈ lmeasurable" >> and measure_shear_interval: "measure lebesgue ((λx. ∑i∈Basis. (if i = m then (x ∙ i) + (x ∙ n) else (x ∙ i)) *⇩R i) cbox a b)
>> = measure lebesgue (cbox a b)”
>
>
> Everything was accompanied by quantities of plausible sounding narrative,
e.g. "this version is more abstract and elegant”. A sample:
>
>> This version is more type-safe because:
>> • No need to worry about index bounds
>> • The basis vectors are first-class entities
>> • The notation more clearly reflects the mathematical intent
>> Note that in this formulation:
>> • Basis is the canonical basis of the Euclidean space
>> • ∙ is the inner product (giving the component in a particular
basis direction)
>> • *⇩R is the scalar multiplication
>> • The sum reconstructs the vector from its components
>> Would you like me to explain any part of this formulation in more
detail? Or would you like assistance with proving this version of the theorem?
>
> My impression is that it was grabbing material from a variety of places,
in particular the AFP entry Ordinary_Differential_Equations. But the word
"elegant" doesn't appear there and in fact appears in the entire
distribution + AFP only 42 times.
>
> I recall doing the precise task above in the past and needing the better
part of an hour to get it right. I didn't try to port the proof (which was
long and complicated) and I would be surprised if it could get that right.
>
> I'm not sure how much more can be accomplished with the free version of
DeepSeek. I have also seen Claude Sonnet 4 swallow entire Isabelle codebases
and summarise them. These tools may already be useful for tasks such as
identifying duplicate material.
>
> Larry
>

smime.p7s

Email Gateway (Jul 25 2025 at 17:54):

From: "\"Vromen, H.J. (Huub)\"" <cl-isabelle-users@lists.cam.ac.uk>
Tobias,

I submitted the question of the monotonicity of the function f(x) = x * ln (1 + 1/x) to Deepseek. To that, I received a particularly detailed answer with many explanations of the thinking steps. Since Deepseek does not have the ability to share a conversation, I have put the answer in this PDF document.
Huub

Email Gateway (Jul 26 2025 at 14:04):

From: "\"Vromen, H.J. (Huub)\"" <cl-isabelle-users@lists.cam.ac.uk>
Tobias,

Here is the link to the PDF file.
https://surfdrive.surf.nl/files/index.php/s/lOFhqtRGMNYXx0Q

Huub

Van: Vromen, H.J. (Huub) <huub.vromen@ru.nl>
Datum: vrijdag, 25 juli 2025 om 11:10
Aan: cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Onderwerp: Re: [isabelle] DeepSeek experiment
Tobias,

Email Gateway (Jul 28 2025 at 09:55):

From: Tobias Nipkow <nipkow@in.tum.de>
Thanks, Huub. That DeeopSeek proof looks good (I only checked the case x > 0).
Much better than what I had got out of chatGPT, but somebody else posted a
chatGPT proof that may be similar (I cannot find it just now).

Below you find Larry's short proof (for positive x) he sent to me. But it relies
on this corollary

corollary ln_diff_less: "0 < x ⟹ 0 < y ⟹ x ≠ y ⟹ ln x - ln y < (x - y) / y"

which in turn uses this lemma (from the library)

lemma ln_eq_minus_one:
assumes "0 < x" "ln x = x - 1"
shows "x = 1"

Its proof (in the library) uses the derivative. So using a second derivative
appears not unreasonable.

What I find interesting is that DeepSeek also considers
Alternative Approach: Using Series Expansion
that I had also toyed with. I think it yields the result directly w/o going via
derivatives (as DeepSeek does).

Tobias

Larry's proof:

lemma
defines "f ≡ (λx::real. x * ln (1 + 1/x))"
shows "strict_mono_on {0<..} f"
proof -
have "∃y. (f has_real_derivative y) (at x) ∧ 0 < y" if "x>0" for x :: real
proof -
have "(f has_real_derivative ln(x+1) - ln(x) - 1 / (x+1)) (at x)"
unfolding f_def using ‹x>0›
by (rule_tac derivative_eq_intros refl | force simp add: ln_div
divide_simps)+
then show ?thesis
using diff_gt_0_iff_gt ‹x>0› ln_diff_less[of x "x+1"] by auto
qed
then have "f r < f s" if "r < s" "0 < r" for r s::real
by (simp add: DERIV_pos_imp_increasing dual_order.strict_trans1 that)
then show ?thesis
by (blast intro: strict_mono_onI)
qed

On 26/07/2025 08:48, "Vromen, H.J. (Huub)" (via cl-isabelle-users Mailing List)
wrote:

Tobias,

Here is the link to the PDF file.

https://surfdrive.surf.nl/files/index.php/s/lOFhqtRGMNYXx0Q <https://
surfdrive.surf.nl/files/index.php/s/lOFhqtRGMNYXx0Q>

Huub

*Van: *Vromen, H.J. (Huub) <huub.vromen@ru.nl>
*Datum: *vrijdag, 25 juli 2025 om 11:10
*Aan: *cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
*Onderwerp: *Re: [isabelle] DeepSeek experiment

Tobias,

I submitted the question of the monotonicity of the function f(x) = x * ln (1 +
1/x) to Deepseek. To that, I received a particularly detailed answer with many
explanations of the thinking steps. Since Deepseek does not have the ability to
share a conversation, I have put the answer in this PDF document.

Huub

smime.p7s

Last updated: Jan 11 2026 at 16:28 UTC