From: Talia Ringer <tringer@cs.washington.edu>
Hi all, hope you're doing well! I'm looking to evaluate a machine learning
tool on a task related to proof repair for Isabelle. I'm curious if anyone
has anything that might help us with benchmarks for this, like:
I'm also curious to what degree the kinds of changes seen in the Coq change
data in REPLICA <https://dependenttyp.es/pdf/analytics.pdf> are similar to
those people make in Isabelle, and where and how they differ, though that
may be a complex question that is not easy to address in an email.
I super welcome any pointers any of you have!
Talia
From: Manuel Eberl <manuel@pruvisto.org>
Hello,
I don't understand what you mean by "mistakes in proofs". The whole
point of having a proof assistant is that you cannot make mistakes in
your proofs, is it not?
Or do you mean mistakes in definitions/theorem statements? Or perhaps
"mistakes" in the sense of "bad style"? Or "mistakes" as in "existing
proof breaks due to some changes elsewhere in the library/in the system"?
Cheers,
Manuel
From: Talia Ringer <tringer@cs.washington.edu>
I know very well how proof assistants work :). Mistakes in proofs happen
when you are attempting to prove something, and you try a particular
approach to your proof, but it doesn't work, so you need to go back and
revisit. See, for example, the linked paper.
Talia
From: Manuel Eberl <manuel@pruvisto.org>
I did not mean to suggest that you don't (my apologies if it sounded that way), but I for one could not see what you meant, and I deemed it likely that others also might not.
It might have become clear from reading the paper, but I think you are likely to get more responses from people if they don't have to read a paper before understanding what you mean.
As for your question, here are a few fairly low-level things off the top of my head:
You try to prove something but the type is too polymorphic. You have to add a type constraint (e.g. 0 :: nat) somewhere or a more specific type class constraint and then it works. This is particularly tricky because you can have two terms that look exactly the same when printed but aren't.
coercions got inserted in the wrong place; the fix is to add them manually in the right place.
You forgot to declare some variable or term as "arbitrary" in an induction.
You have to manually instantiate a fact that you use to help the automation along.
Looping simplifier (but that tends to be less obvious to fix)
A rule doesn't quite apply because of some minor discrepancy (e.g. a + b instead of b + a, different bracketing)
very technical: "using A B by (rule foo) some_method" does not work because "rule" tries to match the chained facts A and B against the premises of "foo". Need to do e.g. "by (rule foo) (use A B in some_method)" instead.
the simplifier does not rewrite underneath "if" by default; need to add if_cong as a congruence rule manually.
very generally: auto/simp get stuck somewhere, but with a few extra "simp" rules the proof goes through.
you added too many facts (and/or facts that are too general) to your goal state, which causes the automation to get lost and not terminate. Fix is to break the proof up into smaller Isar steps or remove unnecessary facts/instantiate them by hand to narrow down the search space.
Most of these are things that you tend to learn very early as an Isabelle user, but especially the polymorphism and the coercions happen even to experienced users and often take a minute or so to figure out.
For a more high-level thing, reindexing arguments for sums are often fairly easy with "sum.reindex_bij_witness". If you don't know that trick, they can be fairly tedious.
Let me know if you want be to explain any of these more. I'll let you know if I think of anything else.
Cheers
Manuel
From: Tjark Weber <tjark.weber@it.uu.se>
Talia,
From: Peter Lammich <lammich@in.tum.de>
I second Manuel's points. Here are a few additions:
On 13/06/2022 22:21, Manuel Eberl wrote:
- You try to prove something but the type is too polymorphic. You have
to add a type constraint (e.g. 0 :: nat) somewhere or a more specific
type class constraint and then it works. This is particularly tricky
because you can have two terms that look exactly the same when printed
but aren't.
This comes in a few varieties. One is forgetting type annotations in
lemmas, as Manuel mentions. Related to that, you sometimes define too
generic constants for your concepts. For example, you might want to
define a shallowly embedded semantics on some memory type mem. For
operations that do not modify memory, it's tempting to do something
like: definition "add r1 r2 m = (r1+r2,m)". If you forget a type
constraint, the constant is too generic, and might require type
annotations later on ... though here, the "correct" way would be to add
a type constraint to the definition.
definition "assert p ≡ if p then Some () else None"
lemma
monad_laws: "a⤜Some = a" "Some x⤜f = f x"
and assert_bind: "assert p⤜f = (if p then f () else None)"
and garbage: "x=y"
note: x and y have type unit in all four statements!
coercions got inserted in the wrong place; the fix is to add them
manually in the right place.
(this was (and still is) my main critique point on coercions. They can
alter the meaning of your (main) theorem without noticing. )Looping simplifier (but that tends to be less obvious to fix)
The simplifier (and classical set) is one of Isabelle's aspects that
require a lot of experience, and there are only rough informal rules how
a good simpset should look like.
lemma [simp]: "f (1::nat) = (2::nat)" will never be applied by the
simplifier, as the simplifier will first rewrite 1 = Suc 0
adding a new simplification rule invalidates existing ones. Similar, a
rule like "1=Suc 0" could be added later, unwittingly making existing
simplifier setup unusable.
simp-rules interfere with locale interpretations:
interpretation my_locale "(1::nat)"
all lemmas and abbreviations will use 1::nat, and break the first time
the simplifier is run. Automatically declaring cong-rules would help
here, but I don't remember why that isn't done by default.
From: Manuel Eberl <manuel@pruvisto.org>
Regarding the "proofs break due to development further up in the theory
hierarchy" thing, I think there are mainly two things that happen:
the name of some definition/theorem changes
the content of some definition (including the type) or theorem changes
Typically such changes make things more general (e.g. a definition with
a more general type, or a theorem with fewer/weaker premises), but the
other direction also happens.
Things becoming less general is of course more likely to cause a
problem, but even things becoming more general can become a problem,
e.g. if an Isar proof explicitly proves something that doesn't have to
be proved anymore.
Changes of names of course also leads to problems. A definition name
changing can break a lot of things and is thus rarely done (it can
happen accidentally when import order changes, e.g. with
MPoly_Type.degree and Polynomial.degree). Fact names changing happens
more often and also breaks proofs that explicitly mention these facts by
names. The most fragile ones here are sledgehammer-generated proofs
using metis/smt. But these can typically be fixed easily by just running
sledgehammer again.
Occasionally, things also break in much more obscure ways, e.g. the
automation failing or looping on something it could previously do.
Repairing this then typically involves again breaking the goal down into
smaller/more explicit steps. This happens particularly in proofs that
are written in "bad style" (in my opinion).
Manuel
From: Lawrence Paulson <lp15@cam.ac.uk>
There's a whole class of such errors that fall under the general heading of "omitting something obvious". Other people have already mentioned instances of this, in particular omitting type constraints or type class constraints, which makes the claim too general. Also under this heading: omitting assertions that are too obvious (e.g. that any mentioned divisors are nonzero, or for a theorem about an integral, forgetting to assume that the given function has an integral).
Another class of errors comes under "setting up induction wrong". This might mean using the wrong induction rule, trying to prove something that is insufficiently general, or incorrectly incorporating site conditions into the induction. This latter error is Isabelle specific and has to do with supplying (e.g. via "using") side conditions into the induction method and being sure to specify any variables in them as "arbitrary".
Larry
From: Talia Ringer <tringer@cs.washington.edu>
Thanks, all, these are super helpful!
Last updated: Jan 04 2025 at 20:18 UTC