From: Mikhail Mandrykin <mandrykin@ispras.ru>
Hello!
A bit of context: We are currently experimenting with an approach to
more reliable SMT proof replay in Isabelle/HOL with the aim to later
apply it to low-level code verification, where there are typically many
similar and fairly uninteresting goals that can be easily discharged by
SMT. The approach is based on deterministic quantifier instantiation in
Isabelle/HOL itself, so that the solver is provided with a
quantifier-free problem, which is always decidable, making the proof
replay very stable and even enabling partial model resonstruction of
counterexamples (a preliminary description, if interesting:
https://forge.ispras.ru/attachments/download/7602/TSMT_Tutorial.pdf
but for now the reconstruction of deterministic one-liners from normal
SMT proofs is not yet properly implemented (requires manual efforts)).
So, naturally, we explored the possibility to replay the proofs within
Isabelle/HOL itself using its internal solvers such as Metis and Argo
(for the fragment without linear integer arithmetic). While we had some
success with Metis+our instantiation (although it's not consistently
faster than SMT proof replay), we encountered unexpected failures in
Argo, particularly in its congruence closure propagation. I prepared a
sample theory
https://gist.github.com/schrodibear/fbe8c2224291e45c9e51c3b812746a46
(for some reason, the list rejected this inlined in the message) with
two goals, where Argo fails with an unexpected exception. The second
goal is in fact valid and can be discharged with Metis. A more general
question is where is it appropriate to reports such problems and whether
Argo is maintained in the upstream or it would be reasonable to try
investigating this ourselves and possibly suggest a patch. The Isabelle
version is Isabelle2020, current development version seems to not have
any changes in Argo since then
Regards,
Mikhail
From: Jasmin Blanchette via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Dear Mikhail,
A bit of context: We are currently experimenting with an approach to more reliable SMT proof replay in Isabelle/HOL with the aim to later apply it to low-level code verification, where there are typically many similar and fairly uninteresting goals that can be easily discharged by SMT. The approach is based on deterministic quantifier instantiation in Isabelle/HOL itself, so that the solver is provided with a quantifier-free problem, which is always decidable, making the proof replay very stable and even enabling partial model resonstruction of counterexamples (a preliminary description, if interesting:
https://forge.ispras.ru/attachments/download/7602/TSMT_Tutorial.pdf
This sounds very interesting. Most of the issues we have appear to be related to Z3's incomplete quantifier instantiation. CVC4 finds a proof using different (probably incomplete) heuristics and Z3 just seems not to find the same quantifiers. I was always afraid an approach based on instantiation would be too verbose (in the output), but now I've changed my mind: Most lemmas are probably used only with one instantiation anyway. It's worth trying anyway.
The second goal is in fact valid and can be discharged with Metis.
I'd expect Metis to, in principle, always terminate on such ground goals. I say "in principle" because it had some completeness bugs, but we're working right now on updating Metis in Isabelle to reflect the latest version.
A more general question is where is it appropriate to reports such problems and whether Argo is maintained in the upstream or it would be reasonable to try investigating this ourselves and possibly suggest a patch. The Isabelle version is Isabelle2020, current development version seems to not have any changes in Argo since then
Argo's developer and maintainer is Sascha Böhme (in CC:). He's in industry now and I don't know how responsive he will be. But do check with him first, and failing that, I would propose you investigate it yourself and send me a patch. (When Sascha left, I inherited some of his code, including "smt", but "argo" is a tool he developed afterwards as a hobby and I don't know it at all.)
Cheers,
Jasmin
From: Tobias Nipkow <nipkow@in.tum.de>
On 06/07/2020 08:37, Jasmin Blanchette via Cl-isabelle-users wrote:
Dear Mikhail,
A bit of context: We are currently experimenting with an approach to more reliable SMT proof replay in Isabelle/HOL with the aim to later apply it to low-level code verification, where there are typically many similar and fairly uninteresting goals that can be easily discharged by SMT. The approach is based on deterministic quantifier instantiation in Isabelle/HOL itself, so that the solver is provided with a quantifier-free problem, which is always decidable, making the proof replay very stable and even enabling partial model resonstruction of counterexamples (a preliminary description, if interesting:
https://forge.ispras.ru/attachments/download/7602/TSMT_Tutorial.pdfThis sounds very interesting. Most of the issues we have appear to be related to Z3's incomplete quantifier instantiation. CVC4 finds a proof using different (probably incomplete) heuristics and Z3 just seems not to find the same quantifiers. I was always afraid an approach based on instantiation would be too verbose (in the output), but now I've changed my mind: Most lemmas are probably used only with one instantiation anyway. It's worth trying anyway.
A general suggestion for sledgehammer: whenever I try to understand a metis
proof found by s/h, I wish s/h would instantiate the lemmas (a little bit)
explicitly because it makes it so much easier to figure out the proof as a
human. Maybe it could be an option and would work only in a proper context where
the variable names have been fixed.
Just one of these things it would be nice to have if the s/h developers had
unbounded time ;-)
Tobias
The second goal is in fact valid and can be discharged with Metis.
I'd expect Metis to, in principle, always terminate on such ground goals. I say "in principle" because it had some completeness bugs, but we're working right now on updating Metis in Isabelle to reflect the latest version.
A more general question is where is it appropriate to reports such problems and whether Argo is maintained in the upstream or it would be reasonable to try investigating this ourselves and possibly suggest a patch. The Isabelle version is Isabelle2020, current development version seems to not have any changes in Argo since then
Argo's developer and maintainer is Sascha Böhme (in CC:). He's in industry now and I don't know how responsive he will be. But do check with him first, and failing that, I would propose you investigate it yourself and send me a patch. (When Sascha left, I inherited some of his code, including "smt", but "argo" is a tool he developed afterwards as a hobby and I don't know it at all.)
Cheers,
Jasmin
From: Mikhail Mandrykin <mandrykin@ispras.ru>
Dear Jasmin,
Jasmin Blanchette via Cl-isabelle-users писал 2020-07-06 09:37:
This sounds very interesting. Most of the issues we have appear to be
related to Z3's incomplete quantifier instantiation. CVC4 finds a
proof using different (probably incomplete) heuristics and Z3 just
seems not to find the same quantifiers. I was always afraid an
approach based on instantiation would be too verbose (in the output)
In our approach we implemented a version of E-matching based on a static
and deterministic over- and under-approximations of the congruence
relation inside our tactic, so the proofs lines do not have to
explicitly provide all the instantiations, most of them can be
rediscovered by the algorithm. Just for curiosity, an example of proof
with and without all explicit instantiations:
define w⇩s w⇩e w⇩I w⇩I⇩0 w⇩i⇩r w⇩I⇩1 w⇩i⇩r⇩0 where
"w⇩s ≡ 𝗐⇩s S (Domain (apsnd Container rel h_ent))"
"w⇩e ≡ 𝗐⇩e (S - (apsnd Container
rel h_ent) S) {}"
"w⇩I ≡ 𝗐⇩I w⇩s (apsnd Container ` rel s'.h_ent) S"
"w⇩I⇩0 ≡ 𝗐⇩I w⇩e (apsnd Container ` rel s'.h_ent) S"
"w⇩i⇩r ≡ 𝗐⇩i⇩r w⇩I w⇩s Container (rel s'.h_ent)"
"w⇩I⇩1 ≡ 𝗐⇩I w⇩I⇩0 (apsnd Container ` rel s'.h_ent) S"
"w⇩i⇩r⇩0 ≡ 𝗐⇩i⇩r w⇩I⇩1 w⇩I⇩0 Container (rel s'.h_ent)"
thus "S - (apsnd Container ` rel s'.h_ent)
S ≠ {}"
by
(tsmt
set_eqIT[of "S - (apsnd Container rel h_ent) `` S" "{}"]
subsetIT[of "S" "Domain (apsnd Container
rel h_ent)"]
emptyT[of "w⇩e"] emptyT[of "w⇩s"] DiffT[of "w⇩e" "S" "(apsnd
Container rel s'.h_ent) `` S"]
DiffT[of "w⇩e" "S" "(apsnd Container
rel h_ent) S"]
DiffT[of "w⇩s" "S" "(apsnd Container ` rel s'.h_ent)
S"]
ImageT[of "w⇩e" "apsnd Container rel s'.h_ent" "S"]
ImageT[of "w⇩s" "apsnd Container
rel s'.h_ent" "S"]
upd_Domain_apsnd_relT[of "w⇩s" "h_ent" "Object ob" "{(c, n)}" ]
upd_apsnd_relT[of "w⇩I⇩0" "w⇩e" "h_ent" "Object ob" "{(c, n)}"]
ImageIT[of "apsnd Container rel h_ent" "S" "w⇩I⇩0" "w⇩e"]
subsetT[of "S" "Domain (apsnd Container
rel s'.h_ent)" "w⇩s"]
DiffT[of "w⇩I⇩0" "S" "(apsnd Container rel s'.h_ent) `` S"]
emptyT[of "w⇩I⇩0"]
ImageT[of "w⇩I⇩0" "apsnd Container
rel s'.h_ent" "S"]
image_apsndT[of "w⇩I⇩1" "w⇩I⇩0" "Container" "rel s'.h_ent"]
image_apsndT[of "w⇩I" "w⇩s" "Container" "rel s'.h_ent"]
tag_ObjectT[of "ob"] tag_ContainerT[of "w⇩i⇩r⇩0"]
tag_ContainerT[of "w⇩i⇩r"])
(explicit version automatically extracted by our proof method from the
following short proof line without instances)
vs.
by (tsmt
set_eqIT subsetIT emptyT DiffT ImageT upd_Domain_apsnd_relT
upd_apsnd_relT ImageIT subsetT DiffT
sets(8, 14, 16) tags(3, 4))
Another, arguably more important problem here, is that since we want our
method to be deterministic, we have to perform full instantiation
according to the algorithm, but this is possible only if the
instantiation terminates. However, in general it does not, so instead of
applying E-matching once until fixpoint, we separately provide syntax to
apply it either until fixpoint or just once. Applying matching once for
every lemma is the default. It makes instantiation fully deterministic,
but makes the proof line discovery harder since we need not simply a set
of lemmas, but a list with proper ordering, because there are
dependencies between the instances. In particular, the list may contain
repeated application of the same lemma as for DiffT in the example
above. So what we are now planning to implement is a heuristical
ordering (possibly with duplication) of facts extracted by sledgehammer
to maximize the set of resulting instances. If we get the proof replayed
within our method once, we can extract the instantiations and further
optimize the proof by removing some redundant instances (this
optimization is already implemented).
I'd expect Metis to, in principle, always terminate on such ground
goals.
We have not encountered cases, where Metis does not terminate for sure
(a small formula, but, say >90s of solving), but even for ground
formulas the time for proof replay grows significantly faster than with
SMT (Z3 with proof reconstruction), e.g. for a formula with 500 ground
assumptions (many redundant instances arising without explicit
instantiation) Metis takes >90s, while SMT takes 42ms for Z3 + 98ms for
proof replay. So we hoped that maybe Argo as an SMT solver can be faster
on such formulas.
Regards,
Mikhail
From: Lawrence Paulson <lp15@cam.ac.uk>
in fact, my main technique for eliminating calls to smt to work out (manually) these instantiations and then supply the ground instances to sledgehammer. The result is frequently a different proof using Metis, simp, etc.
Larry
From: Jasmin Blanchette via Cl-isabelle-users <cl-isabelle-users@lists.cam.ac.uk>
Hi Tobias,
I know. :) You made the suggestion years ago, and back then I didn't prioritize it and failed to understand its potential (documentation, speed up reconstruction, and possibly make the difference between semidecidability and decidability). In retrospect, it's obvious, and just three days ago, I added the following item (#28!) to the Sledgehammer TODO list:
Génération de preuves Isar simples pour arithmétique et instantiation de quantifieurs
Aujourd'hui, il y a plusieurs cas où la génération de preuve Isar prend trop de temps
ou est trop compliquée alors qu'une preuve du genre un have pour chaque lemme
théorie (arithmetique) et un show à la fin qui appelle typiquement metis.
Similairement, CVC4 et peut-être d'autres solveurs SMT peuvent générer les
instances ground des axiomes utilisées. Ça pourrait être utilisé avec note, p.ex.,
note f1 = some_lemma[of a b c], et ensuite un show à la fin. Ou bien directement un
seul gros appel laid à metis etc.
Thankfully it's a relatively easy change, not a full PhD, so it might get done first. Also because I have a bad conscience of not having done it back in 2012-2014.
Jasmin
Last updated: Nov 21 2024 at 12:39 UTC