Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Mistakes in and fixes to Isabelle proofs


view this post on Zulip Email Gateway (Jun 13 2022 at 18:14):

From: Talia Ringer <tringer@cs.washington.edu>
Hi all, hope you're doing well! I'm looking to evaluate a machine learning
tool on a task related to proof repair for Isabelle. I'm curious if anyone
has anything that might help us with benchmarks for this, like:

  1. examples of real mistakes in Isabelle proofs,
  2. examples of real fixes to broken Isabelle proofs,
  3. anecdotes about common ways they mess up when writing proofs,
  4. anecdotes about concrete ways they've messed up on particular proofs in
    the past, even if not documented in version control history.

I'm also curious to what degree the kinds of changes seen in the Coq change
data in REPLICA <https://dependenttyp.es/pdf/analytics.pdf> are similar to
those people make in Isabelle, and where and how they differ, though that
may be a complex question that is not easy to address in an email.

I super welcome any pointers any of you have!

Talia

view this post on Zulip Email Gateway (Jun 13 2022 at 18:57):

From: Manuel Eberl <manuel@pruvisto.org>
Hello,

I don't understand what you mean by "mistakes in proofs". The whole
point of having a proof assistant is that you cannot make mistakes in
your proofs, is it not?

Or do you mean mistakes in definitions/theorem statements? Or perhaps
"mistakes" in the sense of "bad style"? Or "mistakes" as in "existing
proof breaks due to some changes elsewhere in the library/in the system"?

Cheers,

Manuel

view this post on Zulip Email Gateway (Jun 13 2022 at 19:16):

From: Talia Ringer <tringer@cs.washington.edu>
I know very well how proof assistants work :). Mistakes in proofs happen
when you are attempting to prove something, and you try a particular
approach to your proof, but it doesn't work, so you need to go back and
revisit. See, for example, the linked paper.

Talia

view this post on Zulip Email Gateway (Jun 13 2022 at 21:22):

From: Manuel Eberl <manuel@pruvisto.org>
I did not mean to suggest that you don't (my apologies if it sounded that way), but I for one could not see what you meant, and I deemed it likely that others also might not.

It might have become clear from reading the paper, but I think you are likely to get more responses from people if they don't have to read a paper before understanding what you mean.

As for your question, here are a few fairly low-level things off the top of my head:

Most of these are things that you tend to learn very early as an Isabelle user, but especially the polymorphism and the coercions happen even to experienced users and often take a minute or so to figure out.

For a more high-level thing, reindexing arguments for sums are often fairly easy with "sum.reindex_bij_witness". If you don't know that trick, they can be fairly tedious.

Let me know if you want be to explain any of these more. I'll let you know if I think of anything else.

Cheers

Manuel

view this post on Zulip Email Gateway (Jun 13 2022 at 21:45):

From: Tjark Weber <tjark.weber@it.uu.se>
Talia,

view this post on Zulip Email Gateway (Jun 14 2022 at 09:43):

From: Peter Lammich <lammich@in.tum.de>
I second Manuel's points. Here are a few additions:

On 13/06/2022 22:21, Manuel Eberl wrote:

This comes in a few varieties. One is forgetting type annotations in
lemmas, as Manuel mentions. Related to that, you sometimes define too
generic constants for your concepts. For example, you might want to
define a shallowly embedded semantics on some memory type mem. For
operations that do not modify memory, it's tempting to do something
like: definition "add r1 r2 m = (r1+r2,m)". If you forget a type
constraint, the constant is too generic, and might require type
annotations later on ... though here, the "correct" way would be to add
a type constraint to the definition.

definition "assert p ≡ if p then Some () else None"

lemma
        monad_laws: "a⤜Some = a" "Some x⤜f = f x"
    and assert_bind: "assert p⤜f = (if p then f () else None)"
    and garbage: "x=y"

note: x and y have type unit in all four statements!

The simplifier (and classical set) is one of Isabelle's aspects that
require a lot of experience, and there are only rough informal rules how
a good simpset should look like.

lemma [simp]: "f (1::nat) = (2::nat)" will never be applied by the
simplifier, as the simplifier will first rewrite 1 = Suc 0

interpretation my_locale "(1::nat)"

all lemmas and abbreviations will use 1::nat, and break the first time
the simplifier is run. Automatically declaring cong-rules would help
here, but I don't remember why that isn't done by default.

view this post on Zulip Email Gateway (Jun 14 2022 at 10:55):

From: Manuel Eberl <manuel@pruvisto.org>
Regarding the "proofs break due to development further up in the theory
hierarchy" thing, I think there are mainly two things that happen:

  1. the name of some definition/theorem changes

  2. the content of some definition (including the type) or theorem changes

Typically such changes make things more general (e.g. a definition with
a more general type, or a theorem with fewer/weaker premises), but the
other direction also happens.

Things becoming less general is of course more likely to cause a
problem, but even things becoming more general can become a problem,
e.g. if an Isar proof explicitly proves something that doesn't have to
be proved anymore.

Changes of names of course also leads to problems. A definition name
changing can break a lot of things and is thus rarely done (it can
happen accidentally when import order changes, e.g. with
MPoly_Type.degree and Polynomial.degree). Fact names changing happens
more often and also breaks proofs that explicitly mention these facts by
names. The most fragile ones here are sledgehammer-generated proofs
using metis/smt. But these can typically be fixed easily by just running
sledgehammer again.

Occasionally, things also break in much more obscure ways, e.g. the
automation failing or looping on something it could previously do.
Repairing this then typically involves again breaking the goal down into
smaller/more explicit steps. This happens particularly in proofs that
are written in "bad style" (in my opinion).

Manuel

view this post on Zulip Email Gateway (Jun 14 2022 at 11:30):

From: Lawrence Paulson <lp15@cam.ac.uk>
There's a whole class of such errors that fall under the general heading of "omitting something obvious". Other people have already mentioned instances of this, in particular omitting type constraints or type class constraints, which makes the claim too general. Also under this heading: omitting assertions that are too obvious (e.g. that any mentioned divisors are nonzero, or for a theorem about an integral, forgetting to assume that the given function has an integral).

Another class of errors comes under "setting up induction wrong". This might mean using the wrong induction rule, trying to prove something that is insufficiently general, or incorrectly incorporating site conditions into the induction. This latter error is Isabelle specific and has to do with supplying (e.g. via "using") side conditions into the induction method and being sure to specify any variables in them as "arbitrary".

Larry

view this post on Zulip Email Gateway (Jun 15 2022 at 01:51):

From: Talia Ringer <tringer@cs.washington.edu>
Thanks, all, these are super helpful!


Last updated: Jan 04 2025 at 20:18 UTC