[isabelle] strange sledgehammer behaviour · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] strange sledgehammer behaviour

Email Gateway (Aug 22 2022 at 17:03):

From: noam neer <noamneer@gmail.com>
Hi everybody,

I have Isabelle 2017 with the default installation. I tried sledgehammer
several times in the last few days on the following simple problem -

theory tmp
imports Complex_Main
begin
lemma "((9::real) powr (1/2)) = 3"

the results were inconsistent. sometimes it didn't return, sometimes it
returned with the correct suggestion to use "by auto". one of the days it
worked but only when the "3" was replaced by "(3::real)". sometimes the
"sledgehammer" command and the sledgehammer panel gave different results.

my questions are - is this normal behaviour? if not, should I try
reinstalling? if yes, what's causing this inconsistent behaviour? is there
a way to make it more consistent?

thanx,
Noam

Email Gateway (Aug 22 2022 at 17:03):

From: "Dr A. Koutsoukou-Argyraki" <ak2110@cam.ac.uk>
Dear Noam, dear all,

even though I don't know what could be the cause of this (I am an early
beginner)
I'd just like to mention that I have also noticed something similar,
in particular with the cvc4 prover.
In the two very simple examples (see below)
all the provers failed, but after I tried a few days later cvc4 did give
a proof (while the rest of the provers still failed)

Example1:

have " 1/ ((b^2)*((sqrt(2))+(a/b)) )≤ (¦ a^2 - 2*(b^2) ¦ ) *1/
((b^2)*((sqrt(2))+(a/b) )) "
using ‹ ¦ a^2 - 2*(b^2) ¦ ≥1›

by (smt assms(3) assms(4) divide_pos_pos frac_le mult_pos_pos
of_int_1_le_iff real_sqrt_gt_1_iff zero_less_power)

Example2:

have " 1/((b^2)*3) ≤ 1/ ((b^2)*((sqrt(2))+(a/b)) ) "
using ‹(b^2)*3) ≥ ((b^2)*((sqrt(2))+(a/b) ) )›
by (smt assms(3) assms(4) divide_pos_pos frac_le mult_pos_pos
of_int_pos real_sqrt_gt_1_iff zero_less_power)

(* the above proofs were found by cvc4 even though a few days earlier
cvc4 had failed.
assms(3) assms(4) correspond to "a >0" and "b >0"*)

I'm also wondering what could be the reason for this.
Best,
Angeliki

Email Gateway (Aug 22 2022 at 17:04):

From: noam neer <noamneer@gmail.com>
in my case, the inconsistent prover seems to be "e".

Email Gateway (Aug 22 2022 at 17:04):

From: Lawrence Paulson <lp15@cam.ac.uk>
One possibility is that the proof is found near the end of the time limit. It's quite common to have runtime variations on the order of 10% depending on what else your computer is doing.
Larry

Email Gateway (Aug 22 2022 at 17:05):

From: noam neer <noamneer@gmail.com>
just tried it again after rebooting my computer, so only Isabelle was
running.
didn't find a proof.

Email Gateway (Aug 22 2022 at 17:05):

From: noam neer <noamneer@gmail.com>
does any of the provers communicate through the internet, s.t. its results
depend on this communication?

Email Gateway (Aug 22 2022 at 17:05):

From: Mathias Fleury <mathias.fleury12@gmail.com>
Dear Noam,

On 18. Apr 2018, at 14:21, noam neer <noamneer@gmail.com> wrote:

does any of the provers communicate through the internet, s.t. its results
depend on this communication?

By default only vampire communicates through the internet.

On Sat, Apr 14, 2018 at 10:05 AM, noam neer <noamneer@gmail.com> wrote:

just tried it again after rebooting my computer, so only Isabelle was
running.

Actually, I would expect that right after starting plenty programs are running in background to look for updates.

Moreover, you are assuming that:
your system is deterministic. But an antivirus might delay the start. As Larry mentions, there is a ~10% runtime variation.
the solvers are deterministic. This is a non-obvious assumption. A quick look over the source code of CVC4 shows several occurrences of "random" (https://github.com/CVC4/CVC4/search?l=C%2B%2B&q=random&type= <https://github.com/CVC4/CVC4/search?l=C++&q=random&type=>).
the problems generated by several successive sledgehammer calls are the same. Which might or might not be correct depending on many factors, as there is some machine learning running (see "MaSh: Machine Learning for Sledgehammer", from Daniel Kühlwein, Jasmin Christian Blanchette, Cezary Kaliszyk, and Josef Urban).

Altogether, you should just be happy when sledgehammer finds a proof that it did not find before or finds a better one.

a way to make it more consistent?

Don't use sledgehammer ;)

Best,
Mathias

didn't find a proof.

On Fri, Apr 13, 2018 at 12:59 PM, Lawrence Paulson <lp15@cam.ac.uk> wrote:

One possibility is that the proof is found near the end of the time
limit. It's quite common to have runtime variations on the order of 10%
depending on what else your computer is doing.
Larry

On 9 Apr 2018, at 17:03, Dr A. Koutsoukou-Argyraki <ak2110@cam.ac.uk>
wrote:

Dear Noam, dear all,

even though I don't know what could be the cause of this (I am an early
beginner)
I'd just like to mention that I have also noticed something similar,
in particular with the cvc4 prover.
In the two very simple examples (see below)
all the provers failed, but after I tried a few days later cvc4 did
give a proof (while the rest of the provers still failed)

Example1:

have " 1/ ((b^2)*((sqrt(2))+(a/b)) )≤ (¦ a^2 - 2*(b^2) ¦ ) *1/
((b^2)*((sqrt(2))+(a/b) )) "
using ‹ ¦ a^2 - 2*(b^2) ¦ ≥1›

by (smt assms(3) assms(4) divide_pos_pos frac_le mult_pos_pos
of_int_1_le_iff real_sqrt_gt_1_iff zero_less_power)

Example2:

have " 1/((b^2)*3) ≤ 1/ ((b^2)*((sqrt(2))+(a/b)) ) "
using ‹(b^2)*3) ≥ ((b^2)*((sqrt(2))+(a/b) ) )›
by (smt assms(3) assms(4) divide_pos_pos frac_le mult_pos_pos
of_int_pos real_sqrt_gt_1_iff zero_less_power)

(* the above proofs were found by cvc4 even though a few days earlier
cvc4 had failed.
assms(3) assms(4) correspond to "a >0" and "b >0"*)

I'm also wondering what could be the reason for this.
Best,
Angeliki

On 2018-04-09 02:58, noam neer wrote:

Hi everybody,
I have Isabelle 2017 with the default installation. I tried
sledgehammer
several times in the last few days on the following simple problem -
theory tmp
imports Complex_Main
begin
lemma "((9::real) powr (1/2)) = 3"
the results were inconsistent. sometimes it didn't return, sometimes it
returned with the correct suggestion to use "by auto". one of the days
it
worked but only when the "3" was replaced by "(3::real)". sometimes
the
"sledgehammer" command and the sledgehammer panel gave different
results.
my questions are - is this normal behaviour? if not, should I try
reinstalling? if yes, what's causing this inconsistent behaviour? is
there
a way to make it more consistent?
thanx,
Noam

Last updated: Jul 18 2025 at 20:25 UTC