Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] How to debug the auto method?


view this post on Zulip Email Gateway (Apr 20 2024 at 11:16):

From: Karolis Petrauskas <k.petrauskas@gmail.com>
I'm trying to debug the TLA+ proof system (TLAPS), which uses Isabelle as a
backend.
The lemma below is proved by simp but fails to be proved by auto.
While I can set the proof method manually, multiple existing proofs use
auto as the default.
How can I find out why the auto method fails?

theory tlapm_e6d04a_debug
  imports Constant Zenon
begin

lemma ob1:
  fixes S
    and op
    and IsFiniteSet
    and CardSum
  assumes a1: "IsFiniteSet (S)"
      and a2: "∀ e ∈ (S) : op (e) ∈ Nat"
      and a4: "CardSum ((S), (op)) = 0"
      and a5: "
        ⋀ S' :: c. (
          (IsFiniteSet (S')) ⟹ (⋀ op :: c => c. (
            (∀ e ∈ (S') :  (op (e)) ∈ Nat) ⟹ (
              ((CardSum ((S'), (op))) = 0) ⟹ (
                ∀ e ∈ S' : (op (e)) = 0
              )
            )
          ))
        )"
  shows "∀ e ∈ S : op (e) = 0" (is "PROP ?P")
  using assms
  by auto (* NOTE: simp works here *)
end

This fragment is a cleaned-up version of a theory generated by TLAPS; the
original version can be found here (
https://gist.github.com/kape1395/84562bf5679fe9f9f65c4d85715b7827).

This proof is done in the context of the TLA+ theory that can be found here:
https://github.com/tlaplus/tlapm/tree/isabelle2020-dune-2024RC2/isabelle

The behavior looks the same in Isabelle2023 and Isabelle2024-RC2.

Karolis Petrauskas

view this post on Zulip Email Gateway (Apr 20 2024 at 12:11):

From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
I'm certainly not an expert on how "auto" works under the covers, but I can offer some
practical advice based on my experience.

If the goal can be proved with simp but not with auto, then some of the extra things that
auto does beyond simp are getting in the way. Sometimes this can be lemmas that are
declared as "intro", "elim", or "dest" that are ill-formed for the type of reasoning
they are declared for. Sometimes this can be the result of case splitting that auto
does but simp does not. Another thing that I have sometimes seen is that although
auto has a certain capability for "solving" things by instantiating schematic variables,
sometimes it will choose the wrong instance as a "solution", thereby rendering the
resulting set of subgoals that are derived from this choice to be unprovable.

The first major distinction to make is whether auto "loops" or actually terminates with
a failure (you didn't say in your posting). If it "loops", it could be that ill-formed
"intro", "elim", etc. rules are the culprit. Or, it could have embarked on a case-splitting
adventure in which it spends a lot of time chopping things up into a huge number of
small cases. It's hard to give a general strategy for figuring out what is going on
if auto is not terminating, because you don't get any output. If you declared your own
"intro", "elim", etc. rules that were not in the base package that you were using, then
you might try commenting some of those declarations to see if you can get auto to terminate.
Of course, it could be the underlying package that you are using, but presumably someone
has worked with that longer and there is less likelihood of looping problems remaining.
Case splitting issues often tend to occur when working with function or datatype
definitions. However, those are not evident in the example you posted.

If "auto" terminates, then the failed subgoals it outputs upon termination are the place
to start. I would look at them closely to make sure that they are actually true
and that some kind of choice has not been made that created unprovable subgoals,
as I mentioned above. That strikes me as the most likely thing that would result in
auto terminating with failure when simp succeeds. If that is what is occurred,
then it might be possible to rewrite whatever rules auto is using, so the unification
or whatever it is doing to find the instantiations will not result in unprovable
subgoals.

I hope that is at least a little bit helpful and not totally off the mark!


Last updated: May 04 2024 at 16:18 UTC