[isabelle] Z3 proof reconstruction - optimizations · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Z3 proof reconstruction - optimizations

Email Gateway (Aug 19 2022 at 15:32):

From: Filip Marić <filipmatfbgacrs@gmail.com>
Hi!

In a recent project I have been extensively using automation provided by
the SMT solver interface in Isabelle 2013-2 and it had really been a great
experience (many thanks to Sasha, Tjark, Jasmin and other members of the
team for making this option available in Isabelle)! However, I have noticed
that proof reconstruction of some small and fast z3 proofs goes
unexpectedly slow (e.g., z3 produces a proof in under a second and a proof
consists only of several thousand steps, but reconstruction goes well
beyond 10 seconds) and I have decided to investigate this issue deeper. By
using the [[smt_trace = true]] option, I think that I have pinpointed some
sources of inefficiency. Although I do not have explicit timings for each
proof step (what would be a desirable feature), it seemed pretty obvious
that proof reconstruction gets stuck only in a very small number of proof
steps, and interestingly, these are all "rewrite" steps - they are all
given to fast (logic) method and after a rather long period of time it
succeeds (and even worse, in a very small number of cases it seems that it
gets stuck).

One case of such behavior occurred when proving lemmas whose conclusion
contains <-->. At first I solved the problem by using (by (rule iffI, smt+)
instead of by smt), but after examining the problematic rewrite steps I
have noticed that I do not need to do this and that many such proofs can be
made enormously faster by adding the following z3_rule:

lemma [z3_rule]:
"(\<not> (A \<longleftrightarrow> \<not> B)) \<longleftrightarrow> (A
\<longleftrightarrow> B)"
by simp

Then, the problematic rewrite steps are solved in the first pass and are
never even given to the "fast (logic)" method which was very slow on them.

In other cases that I noticed that are problematic in my project I have
also spotted a pattern, but this happened with much more complicated
formulas so I am not sure how to fix this. Here are some examples of
problematic proof steps of this kind (all are rewrite steps):

(iff
(and #72578 (not #72574) #72571 (not #72567) (not (or #134247 #134251
#134255 #134259 #134263 #133898 #133902 #133907)) #72564)
(not (or (not #72578) #72574 (not #72571) #72567 #134247 #134251 #134255
#134259 #134263 #133898 #133902 #133907 (not #72564))))

(iff
(and (not #75701) #72571 (not #72567) (not (or #134259 #134724 #134263
#134729 #134733 #133911 #134575 #133915)) #76020)
(not (or #75701 (not #72571) #72567 #134259 #134724 #134263 #134729 #134733
#133911 #134575 #133915 (not #76020))))

(iff
(and (not #96335) #93205 (not #93201) (not (or #140614 #141079 #140618
#141084 #141088 #140266 #140930 #140270)) #96654)
(not (or #96335 (not #93205) #93201 #140614 #141079 #140618 #141084
#141088 #140266 #140930 #140270 (not #96654))))

(iff
(and (not #93212) (not #93208) #93892 (not #92681) (not (or #140769 #140773
#140602 #140777 #140614 #140436 #140440 #140253)) #93887)
(not (or #93212 #93208 (not #93892) #92681 #140769 #140773 #140602 #140777
#140614 #140436 #140440 #140253 (not #93887))))

(iff
(and (not #181613) #181578 #206617 #180700 #180691 #181342 #181291 (not (or
(not #206438) (not #206236))) #206701)
(not (or #181613 (not #181578) (not #206617) (not #180700) (not #180691)
(not #181342) (not #181291) (not #206438) (not #206236) (not #206701))))

Hash-numbers in these formulas represent very complex atoms of linear
arithmetic (some of them extensively use ite operator). It can be noticed
that in all these cases the lhs of iff is a conjuction containing atoms and
exactly one negation of a disjunction.

(1)
Is it possible to add some simple z3_rules that would cover all these proof
steps and to prevent to give this large formule to the fast (logic) method
as it turns out that it needs much time (several seconds) to discharge them
(and in some cases it seems that it gets totally stuck and cannot finish
their proof)?

(2)
Is it possible to set a timeout for each proof step or for the whole proof
reconstruction (it seems to me that smt_timeout is only for SMT solver
time, and not the reconstruction time, but maybe I am wrong here)? From a
users perspective it seems that it would be good to have a timeout for each
proof step as it would help detecting the cases were reconstruction gets
stuck or is very slow. Otherwise i must either use smt_trace which is slow
because of the large output or I must wait very long time and never know if
the reconstruction is stuck somewhere. A very nice option would be to give
some progress ratio in the trace -- as you know the total number of proof
steps that need to be reconstructed, from time to time (e.g., every second)
you could print the percent of reconstructed proof steps. Also, is it
possible to get some statistics about the number of proof steps of each
kind and time spent for their reconstruction in Isabelle?

I hope that you have some quick and nice solution to the issue (1), and
issue (2) is only a feature request from a user so I know that it cannot be
done immediately, but I hope that some of this could be incorporated in
some future releases :)

Best regards,
Filip

Email Gateway (Aug 19 2022 at 16:27):

From: Jasmin Christian Blanchette <jasmin.blanchette@gmail.com>
Hi Filip,

First, sorry for the enormous delay in answering your email. I was hoping at first that some of the people who have more of a clue about reconstruction might answer first, but now sweeping through my "Isabelle" email folter I see that nobody answered it yet.

In a recent project I have been extensively using automation provided by
the SMT solver interface in Isabelle 2013-2

Be aware that Sascha and I have reimplemented the SMT module for Isabelle2014. The new module is called SMT2 -- the corresponding proof method is "smt2". In the current repository version, and in the future Isabelle2015, SMT is renamed to Old_SMT (and "old_smt") and SMT2 becomes SMT again.

The main changes are that (1) the new module is now based on SMT-LIB 2; (2) we now use the latest Z3 (4.3.2) and CVC4 (1.5-prerelease); (3) the Z3 proof reconstruction code has been rewritten from scratch (by Sascha); and (4) we now offer Isar proofs as an alternative to "smt", for those who don't want to rely on Z3 for reconstruction (e.g. for submitting to Isabelle or the AFP).

One case of such behavior occurred when proving lemmas whose conclusion
contains <-->. At first I solved the problem by using (by (rule iffI, smt+)
instead of by smt), but after examining the problematic rewrite steps I
have noticed that I do not need to do this and that many such proofs can be
made enormously faster by adding the following z3_rule:

lemma [z3_rule]:
"(\<not> (A \<longleftrightarrow> \<not> B)) \<longleftrightarrow> (A
\<longleftrightarrow> B)"
by simp

Then, the problematic rewrite steps are solved in the first pass and are
never even given to the "fast (logic)" method which was very slow on them.

I'll try adding that.

In other cases that I noticed that are problematic in my project I have
also spotted a pattern, but this happened with much more complicated
formulas so I am not sure how to fix this. Here are some examples of
problematic proof steps of this kind (all are rewrite steps):

(iff
(and #72578 (not #72574) #72571 (not #72567) (not (or #134247 #134251
#134255 #134259 #134263 #133898 #133902 #133907)) #72564)
(not (or (not #72578) #72574 (not #72571) #72567 #134247 #134251 #134255
#134259 #134263 #133898 #133902 #133907 (not #72564))))

...

(iff
(and (not #181613) #181578 #206617 #180700 #180691 #181342 #181291 (not (or
(not #206438) (not #206236))) #206701)
(not (or #181613 (not #181578) (not #206617) (not #180700) (not #180691)
(not #181342) (not #181291) (not #206438) (not #206236) (not #206701))))

Hash-numbers in these formulas represent very complex atoms of linear
arithmetic (some of them extensively use ite operator). It can be noticed
that in all these cases the lhs of iff is a conjuction containing atoms and
exactly one negation of a disjunction.

It might be that the issue goes away with the new module. Sascha has implemented a new proof producing SAT solver in Isabelle (to be used by "sat" and "satx") and uses it for propositional reasoning, which should work for the above formulas. That would be my answer to point (1):

(1)
Is it possible to add some simple z3_rules that would cover all these proof
steps and to prevent to give this large formule to the fast (logic) method
as it turns out that it needs much time (several seconds) to discharge them
(and in some cases it seems that it gets totally stuck and cannot finish
their proof)?

Regarding point (2):

(2)
Is it possible to set a timeout for each proof step or for the whole proof
reconstruction (it seems to me that smt_timeout is only for SMT solver
time, and not the reconstruction time, but maybe I am wrong here)? From a
users perspective it seems that it would be good to have a timeout for each
proof step as it would help detecting the cases were reconstruction gets
stuck or is very slow. Otherwise i must either use smt_trace which is slow
because of the large output or I must wait very long time and never know if
the reconstruction is stuck somewhere. A very nice option would be to give
some progress ratio in the trace -- as you know the total number of proof
steps that need to be reconstructed, from time to time (e.g., every second)
you could print the percent of reconstructed proof steps. Also, is it
possible to get some statistics about the number of proof steps of each
kind and time spent for their reconstruction in Isabelle?

I hope that you have some quick and nice solution to the issue (1), and
issue (2) is only a feature request from a user so I know that it cannot be
done immediately, but I hope that some of this could be incorporated in
some future releases :)

What you are asking for are really debugging features for powerusers. You happen to be such a user, but we designed "smt" for people who don't necessarily know how SMT solvers work. I am currently just trying to keep it alive. My main interest is rather in the Isar proof reconstruction, which is so far less powerful and less efficient but has the merit of kicking Z3 out of the reconstruction loop. Judging from the difficulty of your SMT problems, it looks like the "smt" method is perhaps still the best option for you.

Starting next year, I will be working on a project where I might find myself adding more theory support to Isabelle's SMT module. If it comes to this, and I start diving into the code, I might very well add some of the features you describe. Otherwise, my recommendation would be for you to implement what you need yourself.

One more point: According to my experience, "smt" succeeds in reconstructing proofs in about 99% of cases where Z3 succeeds from Sledgehammer. Judging from your comments, your problems appear to be more difficult than typical Sledgehammer problems. Could you give us some details on your use of "smt"?

Regards,

Jasmin

Email Gateway (Aug 19 2022 at 16:57):

From: boehmes@in.tum.de
Hi Filip,

Your feature request (2) will be available in the next release of Isabelle. There are two new options:

smt_reconstruction_step_timeout specifies the maximum time for reconstructing a single Z3 step
smt_statistics prints statistics for Z3 proof reconstruction: the total runtime, the number and the individual runtime of the reconstructed Z3 rules, and the progress of Z3 proof reconstruction based on the number of reconstructed steps (after each block of 100 successfully reconstructed steps, the runtime is printed)

Cheers,

Sascha

Last updated: Jul 12 2025 at 08:27 UTC