[isabelle] Isabelle2013-1-RC3: try no longer seems to giv... · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Isabelle2013-1-RC3: try no longer seems to giv...

Email Gateway (Aug 19 2022 at 12:14):

From: Christoph LANGE <math.semantic.web@gmail.com>
Dear Isabelle experts,

my impression (which may have been wrong) from Isabelle2013 was that
"try" tries "try0" before "sledgehammer". In Isabelle2013-1-RC3 this no
longer seems to be the case.

I find this quite annoying because I have made it a habit to enter "try"
after almost every statement. In many cases I don't know whether try0
would be sufficient to prove something, or whether a more complex proof
(such as those that sledgehammer finds) is required. Then, when
sledgehammer finds a proof, looking at the lemmas used by this proof
sometimes gives me the intuition that try0 might also suffice, e.g. when
I know that all of these lemmas are simp rules.

Here is a self-contained almost minimal example. I didn't try this with
older versions but I'd be happy to do so. Please ignore the first step:
maybe there is an easier proof for this, but for _this_ email only the
second step is relevant.

notepad
begin
fix R::"('a × 'b) set"
and N::"'a set"
and n::'a
have "{ (x, y) . (x, y) ∈ R ∧ x ≠ n }¯ = { (y, x) . (y, x) ∈ R¯ ∧ x ≠ n }"
by (smt Collect_cong converse_unfold curryE curry_split
mem_Collect_eq splitD split_cong)
then have "Domain { (x, y) . (x, y) ∈ R ∧ x ≠ n } = Range { (y, x) .
(y, x) ∈ R¯ ∧ x ≠ n}" try
end

On my machine try0 finds that simp/auto/force each take 2 ms.

However when I say "try", it resorts to running sledgehammer, which
tells me:

--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---
Trying "solve_direct", "quickcheck", "try0", "sledgehammer", and
"nitpick"...
"z3": Sledgehammer ("z3") found a proof: by (metis Range_converse) (128 ms).
To minimize: sledgehammer min (Range_converse {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
"spass": Sledgehammer ("spass") found a proof: by (metis Range_converse)
(108 ms).
To minimize: sledgehammer min (Range_converse {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
"remote_vampire": Sledgehammer ("remote_vampire") found a proof: by
(metis Range_converse converse_unfold) (109 ms).
To minimize: sledgehammer min (Range_converse converse_unfold {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
"e": Sledgehammer ("e") found a proof: by (smt Collect_mem_eq
Domain_Collect_split Domain_converse Domain_unfold converse_converse
converse_unfold internal_split_def) (28 ms).
To minimize: sledgehammer min [smt] (Collect_mem_eq Domain_Collect_split
Domain_converse Domain_unfold converse_converse converse_unfold
internal_split_def {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
Structured proof (3 ms):
proof -
show "Domain {(x, y). (x, y) ∈ R ∧ x ≠ n} = Range {(y, x). (y, x) ∈ R¯
∧ x ≠ n}"
using Domain_unfold by auto
qed
--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---

Here, it was clearly the last sledgehammer output that gave me the idea
that try0 can do the job. In other cases (I'll be happy to dig out some
more) it's not that easy.

Any ideas?

Cheers, and thanks in advance,

Christoph

Email Gateway (Aug 19 2022 at 12:14):

From: Christoph LANGE <math.semantic.web@gmail.com>
A minor correction:

2013-10-27 16:06 Christoph LANGE:

my impression (which may have been wrong) from Isabelle2013 was that
"try" tries "try0" before "sledgehammer". In Isabelle2013-1-RC3 this no
longer seems to be the case.

I find this quite annoying because I have made it a habit to enter "try"
after almost every statement. In many cases I don't know whether try0
would be sufficient to prove something, or whether a more complex proof
(such as those that sledgehammer finds) is required. Then, when
sledgehammer finds a proof, looking at the lemmas used by this proof
sometimes gives me the intuition that try0 might also suffice, e.g. when
I know that all of these lemmas are simp rules.

Here is a self-contained almost minimal example. I didn't try this with
older versions but I'd be happy to do so. Please ignore the first step:
maybe there is an easier proof for this, but for _this_ email only the
second step is relevant.

OK, I see this is not the most elegant example, as …

notepad
begin
fix R::"('a × 'b) set"
and N::"'a set"
and n::'a
have "{ (x, y) . (x, y) ∈ R ∧ x ≠ n }¯ = { (y, x) . (y, x) ∈ R¯ ∧ x ≠ n }"
by (smt Collect_cong converse_unfold curryE curry_split
mem_Collect_eq splitD split_cong)
then have "Domain { (x, y) . (x, y) ∈ R ∧ x ≠ n } = Range { (y, x) .
(y, x) ∈ R¯ ∧ x ≠ n}" try
end

On my machine try0 finds that simp/auto/force each take 2 ms.

However when I say "try", it resorts to running sledgehammer, which
tells me:

--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---
Trying "solve_direct", "quickcheck", "try0", "sledgehammer", and
"nitpick"...
"z3": Sledgehammer ("z3") found a proof: by (metis Range_converse) (128 ms).
To minimize: sledgehammer min (Range_converse {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
"spass": Sledgehammer ("spass") found a proof: by (metis Range_converse)
(108 ms).
To minimize: sledgehammer min (Range_converse {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
"remote_vampire": Sledgehammer ("remote_vampire") found a proof: by
(metis Range_converse converse_unfold) (109 ms).
To minimize: sledgehammer min (Range_converse converse_unfold {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
"e": Sledgehammer ("e") found a proof: by (smt Collect_mem_eq
Domain_Collect_split Domain_converse Domain_unfold converse_converse
converse_unfold internal_split_def) (28 ms).
To minimize: sledgehammer min [smt] (Collect_mem_eq Domain_Collect_split
Domain_converse Domain_unfold converse_converse converse_unfold
internal_split_def {(x, y). (x, y) ∈ R ∧ x ≠ n}¯ = {(y, x). (y, x) ∈ R¯ ∧ x ≠ n}).
Structured proof (3 ms):
proof -
show "Domain {(x, y). (x, y) ∈ R ∧ x ≠ n} = Range {(y, x). (y, x) ∈ R¯
∧ x ≠ n}"
using Domain_unfold by auto
qed

… this shows that the first proof step is not necessary at all, because
"using Domain_unfold by auto" suffices to establish the second step
without the first step. Still this doesn't affect my point that "try"
seems to give preference to "sledgehammer" over "try0".

--- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< --- %< ---

Here, it was clearly the last sledgehammer output that gave me the idea
that try0 can do the job. In other cases (I'll be happy to dig out some
more) it's not that easy.

Any ideas?

Cheers, and thanks in advance,

Christoph

Email Gateway (Aug 19 2022 at 12:28):

From: Jasmin Blanchette <jasmin.blanchette@gmail.com>
Hi Christoph,

Sorry for the delay in answering.

"try" does a number of things in parallel, using the ML function "Par_List.get_some". Because it may use several threads in parallel, there are no guarantees about which result will be printed first. Usually, "try0" results tend to be shown first because the "try0" thread is launched first and it tends to return first as well; but Sledgehammer has become faster, and "try0" is only as fast as its slowest tool (with a timeout of 5 s).

In short, this does not appear to be a deep difference between Isabelle2013-1-RC1 and Isabelle2013, but rather a matter of (bad) luck.

Still, the behavior you described is somewhat undesirable, so I've now made it less likely to happen in the repository version of Isabelle (change c7af3d651658). When "try0" is invoked from "try", it now stops as soon as it has found one proof, instead of reporting detailed timings for different tools.

Since this is not a true regression, and there is an easy workaround (i.e. invoke "try0" directly), I will not ask Makarius to add the change to the final version of Isabelle2013-1. If this is very important to you, I would suggest that you apply the patch locally. In any case, the patch should be part of (the expected) Isabelle2014.

Thank you for your report.

Regards,

Jasmin

Last updated: Jul 18 2025 at 20:25 UTC