Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Sledgehammer 2015 oddity


view this post on Zulip Email Gateway (Aug 22 2022 at 09:57):

From: Thomas Sewell <thomas.sewell@nicta.com.au>
Hello isabelle-users.

A friend of mine has recently updated to Isabelle 2015 and has come
across what looks like a sledgehammer bug.

Short summary: in Isabelle 2015 all sledgehammer provers recommend "by
auto" to solve a goal that can be solved trivially but not by auto.

A demo theory is attached.

Cheers,
Thomas.


The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
Test.thy

view this post on Zulip Email Gateway (Aug 22 2022 at 09:58):

From: Jasmin Blanchette <jasmin.blanchette@inria.fr>
Hi Thomas,

Short summary: in Isabelle 2015 all sledgehammer provers recommend "by
auto" to solve a goal that can be solved trivially but not by auto.

Thanks for the example. I am very much dependent on user-provided examples for debugging this feature.

Here, the problem is that Sledgehammer noticed that “using foo bar by auto” solved the problem, then its minimized kicked in and noticed that “foo” and “bar” were in the simpset and simply took them away. In most cases, this is a useful, time- and CPU-saving strategy, but in your example taking out “foo" turned out to be deadly. See change 7d278b0617d3 — unfortunately too late for Isabelle2015. The change joins 2c468c062589 and 533f6cfc04c0, done earlier this year.

Appeal to the list: Please let me know if you run into other examples where either Sledgehammer finds a proof that doesn’t work and it claims it does or vice versa. The issues are normally easy to track down and address if you can provide me with an example that reproduces the problem.

Jasmin

view this post on Zulip Email Gateway (Aug 22 2022 at 10:02):

From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
Here is one I just ran into. It was easy to abstract from the rest of the
theory so I am posting it.

- Gene Stark

theory Barf
imports Main
begin

definition graph :: "'a set ⇒ 'a set ⇒ ('a ⇒ 'a) ⇒ 'a ⇒ 'a ⇒ bool"
(*

* With the more general type "'a set ⇒ ' b set ⇒ ('a ⇒ 'b) ⇒ 'a ⇒ 'b ⇒ bool"
* the proof by fastforce succeeds below.
*)
where "graph A B f a b = (a ∈ A ∧ b ∈ B ∧ f a = b)"

lemma "a ∈ A ∧ b ∈ B ∧ c ∈ C ∧ f a = b ∧ g b = c ⟹ graph A C (g o f) a c"
using graph_def
(* try0 (* Says: 'Try this: by fastforce' *) *)
by fastforce (* Fails to prove lemma -- unification bound exceeded, etc. *)

end
Barf.thy

view this post on Zulip Email Gateway (Aug 22 2022 at 10:19):

From: Makarius <makarius@sketis.net>
The above actually works, but fastforce is stalled and waiting for
confirmation to spill out more "potentially useful tracing messages" at
the very bottom.

This is a light form of denial-of-service attack on the front-end. A
heavy form would be a bombing of the IDE process with tons of pointless
messages.

The above example can be made to work as follows:

using [[unify_trace_bound = 60]]
by fastforce

Sledgehammer is internally very careful to provide a sensible value to
such options, see "silence_methods" here:
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/try0.ML#l104

That setup is not active for the generated proof snippet, though.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 11:03):

From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
I just ran into a variant of this which I think is a bona fide example of what
Jasmin is looking for.

- Gene Stark

theory Barf
imports Main
begin

definition graph :: "'a set ⇒ 'b set ⇒ ('a ⇒ 'b) ⇒ ('a × 'b) set"
where "graph A B f = {(a, b). a ∈ A ∧ b ∈ B ∧ f a = b}"

lemma "graph A B f ⊆ (graph A B f) O (Id_on B)"
(* try
Trying "solve_direct", "quickcheck", "try0", "sledgehammer", and "nitpick"...
Ignoring duplicate rewrite rule:
?a1 ∈ Collect ?P1 ≡ ?P1 ?a1
Ignoring duplicate rewrite rule:
snd (?x1.1, ?y) ≡ ?y
SMT: Solver z3: Counterexample found (possibly spurious)
"cvc4": Try this: by (smt Collect_splitD Id_onI graph_def relcomp.simps snd_conv subrelI) (754 ms).

The above one-liner succeeds. However, see below...
*)
using graph_def
(* try
"cvc4": Try this: by auto (37 ms).
"z3": Try this: by auto (53 ms).
*)
by auto
(* Failed to finish proof⌂:
goal (1 subgoal):

1. ⋀a b. (⋀A B f. graph A B f = {(a, b). a ∈ A ∧ b ∈ B ∧ f a = b}) ⟹
(a, b) ∈ graph A B f ⟹ (a, b) ∈ graph A B f O Id_on B
*)

end
Barf.thy

view this post on Zulip Email Gateway (Aug 22 2022 at 11:41):

From: Jasmin Blanchette <jasmin.blanchette@inria.fr>
Dear Eugene,

Thank you for your report. It appears to work fine with a recent repository version (e.g. 48600872b12c). After the 2015 release, I sorted out a number of issues related to preplaying.

Regards,

Jasmin


Last updated: Apr 23 2024 at 12:29 UTC