Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Simple small example -- running into problems


view this post on Zulip Email Gateway (Jan 08 2024 at 16:27):

From: Christoph Benzmueller <c.benzmueller@gmail.com>
Dear Isabelle Users List,

attached is a very simple example for sledgehammer that should be easy to
solve and reconstruct, but this example seems to run into unexpected
problems:

[image: image.png]

Best,
Christoph

image.png
SimpleStuff.thy

view this post on Zulip Email Gateway (Jan 09 2024 at 09:32):

From: Jasmin Blanchette <jasmin.blanchette@ifi.lmu.de>
Dear Christoph,

Thanks for reporting this issue! The spectacular failure of "fastforce" -- when Sledgehammer says "18 ms" -- is definitely worth investigating. I'm a bit overwhelmed the coming weeks, but I'll come back to you if I find a fix.

Concerning the provers used by Sledgehammer:

With Isabelle2022, I get that some remote provers, leo2 and agsyhol, can solve your example.

With Isabelle2023, I get the same result as you.

With a recent repository version of Isabelle, I get that E finds a proof.

I don't understand why you call your problem "a very simple problem". It's very higher-order and as such unsuitable for most provers based on encodings, including "metis" -- and there's no automation we can use in Isabelle that can reconstruct these things. It is interesting, though, that Zipperposition fails where other higher-order provers succeed.

Best,
Jasmin

view this post on Zulip Email Gateway (Jan 09 2024 at 11:13):

From: Christoph Benzmueller <c.benzmueller@gmail.com>
Hi Jasmin,

no, worry, there is no time pressure to look at this failure of
"fastforce", but it clearly seems to be an issue to be resolved.

The reason I named this "SimpleStuff" is that
(i) veteran higher-order provers like agsyHOL, Satallax and Leo-II can
solve this very quickly (maybe even Leo-I could already, I need to check)
(ii) only a very limited amount of higher-order reasoning (e.g. in the form
of primitive substitutions) is needed, so this example (which is exemplary
for a whole class of related examples) should not be neglected in the
search space of today's higher-order proof systems, given the great ongoing
progress in this area.

Best wishes,
Christoph

view this post on Zulip Email Gateway (Jan 09 2024 at 11:31):

From: Manuel Eberl <manuel@pruvisto.org>
Out of curiosity, I took a look at what you mean by a "spectacular
failure of fastforce" and found that you get an "Enter Match" message
and everything basically freezes up.

I've never quite understood what this "Enter Match" thing is and whether
the fact that it happens sometimes is something that we just have to
live with. But the fact that there's a lot of nested occurrences of
"remdups" in that "Enter Match" message makes me wonder whether there is
perhaps a poorly-chosen rule involving "remdups" in the claset and
whether someone ought to investigate this.

Manuel

view this post on Zulip Email Gateway (Jan 09 2024 at 16:50):

From: Stepan Holub <cl-isabelle-users@lists.cam.ac.uk>
Dear Christoph,

I wonder  why you use the axiomatized type.
The theorem holds for any type, and the sledgehammer finds the proof
(vampire):

lemma "∃ F :: ('a ⇒ bool) ⇒ 'a. ∀ t::'a. (∃ S. F S = t)"
  by (metis some_sym_eq_trivial)

As Manuel reports, the problem is with fastforce which is suggested by
zipperposition (and by try0), but it freezes the whole Isabelle.

Stepan

image.png

view this post on Zulip Email Gateway (Jan 10 2024 at 09:17):

From: Christoph Benzmueller <c.benzmueller@gmail.com>
Dear Stepan,

as a reply to your suggestion/question I have a four comments:

(i) Proving theorems automatically by generalisation and subsequent
instantiation is a great research topic and there has been interesting work
in the past (e.g. in the context of inductive theorem proving). So, if you
have (or will have) tools that automatically do what you suggest, then this
is really cool and I would like to learn more about it. (Replacing
automatic proof by user-interaction/semi-automatic proof is not so much in
my interest here).

(ii) A slight variation of theorem T1 is theorem T2 below. For this one
your generalisation recipe does (obviously) not work, since there is now a
countermodel.

[image: image.png]

(iii) Such problems may come from concrete application contexts, where
instead of reasoning with arbitrary cardinalities (as you suggest) one
might be interested to reason with exactly controlled domain cardinalities
(like here for type i). Eg. in the context of our previous
computer-supported explorations in category theory (with Dana Scott) we
were also interested in enumerating functors in a controlled way when
fixing certain domain cardinalities. Moreover, as (ii) shows,
generalisation may not be a general answer.

(iv) Even if there are alternative solutions to problems like T1 or T2, it
nevertheless makes sense to make our proof assistants as robust and
powerful as possible.

Best wishes,
Christoph

image.png

view this post on Zulip Email Gateway (Jan 10 2024 at 10:23):

From: Stepan Holub <cl-isabelle-users@lists.cam.ac.uk>
Indeed, for T2 there is
"zipperposition: One-line proof reconstruction failed: by metis"
so this is a sledgehammer reconstruction problem going beyond the
strange behavior of fastforce.

S.

image.png

view this post on Zulip Email Gateway (Jan 10 2024 at 11:24):

From: Lawrence Paulson <lp15@cam.ac.uk>
That message is higher-order unification talking to you. It's a sign that a nontrivial HO unification (i.e. something beyond pattern unification) is going deeper than usual. There was an issue on this mailing list within the past year caused by the unification depth bound being exceeded. By increasing the relevant depth parameter, the failure could be eliminated. It's possible that the mere output of tracing messages produces so much overhead that the proof will fail.

We could also consider increasing the defaults. I'm not sure that they have increased much in nearly 40 years of Moore’s Law.

Larry


Last updated: Apr 29 2024 at 01:08 UTC