Stream: Mirror: Isabelle Development Mailing List

Topic: [isabelle-dev] AFP: PAPP_Impossibility not terminating


view this post on Zulip Email Gateway (Apr 17 2023 at 12:58):

From: Makarius <makarius@sketis.net>
We have a problem with PAPP_Impossibility for quite some time.

I am presently running a bisection to see better where it actually happens:
presently the interval is Isabelle/f5aca3ed1adb .. 69ee23f83884 based on
https://isabelle.sketis.net/devel/build_status/AFP/PAPP_Impossibility.csv

Stay tuned ...

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 17 2023 at 13:40):

From: Fabian Huch <huch@in.tum.de>
From Jenkins, the tighter interval is Isabelle/3fb2c47a7605 ..
Isabelle/1a9decb8bfbc [1,2].

Fabian

[1]: https://ci.isabelle.systems/jenkins/job/isabelle-all/4411/

[2]: https://ci.isabelle.systems/jenkins/job/isabelle-all/4412/


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 17 2023 at 21:39):

From: Makarius <makarius@sketis.net>
The first bad revision is:
changeset: 77808:b43ee37926a9
user: wenzelm
date: Mon Apr 10 22:38:18 2023 +0200
summary: performance tuning: replace Ord_List by Set();

This is a change of the thm hyps field to make applications like
PAPP_Impossibility actually faster, but I did not notice that this is such an
application, and that it becomes very slow.

Even worse, it is not sufficient to take b43ee37926a9 back, because other
changes on top appear to cause similar problems.

I need to look more closely into the special SAT prover in PAPP_Impossibility,
to see what is really going on.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 18 2023 at 08:07):

From: Manuel Eberl <manuel@pruvisto.org>
Oh dear, that is unfortunate.

PAPP_Impossibility is probably one of the most brutal benchmarks of
low-level theorem operations we have. The SAT prover in that entry
essentially proves a large number of clauses (a few hundred thousand)
and then starts plugging them together in a specific, externally
determined way until it derives the empty clause. The first part is
quite fast and still works without any problems. The really
resource-intensive bit is the second part, and this is the part that
doesn't terminate anymore (or more probably, takes much longer than it
did before).

I did notice at the time that the vast majority of the time was spent in
taking the union of the hyps of the different theorems being combined,
so I made sure that when I create the theorems, they all have
pointer-equal hyps, which sped things up phenomenally.

Hope that helps. If you have any questions about the code, I'll be happy
to answer them. I should also still have a few smaller example
applications of that SAT solver lying around that one could use; there
the difference between the old and the new version would perhaps not be
"2 minutes" vs "non-termination" but rather something like "almost
instant" and "half a minute" or so, which may make debugging less painful.

Manuel


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 18 2023 at 09:05):

From: Makarius <makarius@sketis.net>
On 18/04/2023 10:07, Manuel Eberl wrote:

I did notice at the time that the vast majority of the time was spent in
taking the union of the hyps of the different theorems being combined, so I
made sure that when I create the theorems, they all have pointer-equal hyps,
which sped things up phenomenally.

Did you re-invent this yourself, or did you see it here?
https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/HOL/Tools/sat.ML$348

I should also still have a few smaller example applications of
that SAT solver lying around that one could use; there the difference between
the old and the new version would perhaps not be "2 minutes" vs
"non-termination" but rather something like "almost instant" and "half a
minute" or so, which may make debugging less painful.

It would be great to have such small benchmark examples. Please send them to
me privately.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 18 2023 at 09:37):

From: Manuel Eberl <manuel@pruvisto.org>
I do know of that one as well, but it uses some old and rather obscure
SAT solvers (with some custom patches even, I think) and a different
proof format as well (that isn't really used by any modern SAT solvers,
I think). Also, as I recall there was only one suppoerted SAT solver
that I managed to find, compile, and run. And then the approach didn't
really scale to my kind of setting. In earlier versions of the same
entry I had SAT instances with 20 million clauses, which was simply too
much.

My implementation uses the DRAT format, which seems to be a kind of
de-facto standard in the SAT community nowadays.

I did take some inspiration from the implementation you linked though. I
don't think they had to do any of the low-level fine-tuning with hyps
that I had to do though.

Manuel


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 18 2023 at 09:52):

From: Makarius <makarius@sketis.net>
On 18/04/2023 11:37, Manuel Eberl wrote:

I do know of that one as well, but it uses some old and rather obscure SAT
solvers (with some custom patches even, I think) and a different proof format
as well (that isn't really used by any modern SAT solvers, I think). Also, as
I recall there was only one suppoerted SAT solver that I managed to find,
compile, and run. And then the approach didn't really scale to my kind of
setting. In earlier versions of the same entry I had SAT instances with 20
million clauses, which was simply too much.

It is clear that this old SAT implementation is not usable anymore.

My question was more in the direction: How much old/obsolete things from
sat.ML are to be expected as clones in your version?

Apparently little or none.

My implementation uses the DRAT format, which seems to be a kind of de-facto
standard in the SAT community nowadays.

I did take some inspiration from the implementation you linked though. I don't
think they had to do any of the low-level fine-tuning with hyps that I had to
do though.

OK, this already provides some clues.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 18 2023 at 10:41):

From: Makarius <makarius@sketis.net>
I have now managed to revert several changes to make PAPP_Impossibility works
again --- nothing pushed yet.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev

view this post on Zulip Email Gateway (Apr 18 2023 at 21:37):

From: Makarius <makarius@sketis.net>
To conclude this divergence, the situation in Isabelle/dd222e2af01a +
AFP/fcf84c1dcf9b is now as follows:

* All my inference kernel changes concerning the improved Table() and Set()
functors have been reverted: they help for large persistent data, but not for
data that changes dynamically a lot (e.g. the hyps in inferences, where merge
is critical).

* There are some minor changes to
AFP/thys/PAPP_Impossibility/papp_impossibility.ML to make it faster than
before: this includes the version that you had sent me privately + plus some
tuning. The key changes are:

changeset: 13550:317d450e66bc
user: wenzelm
date: Tue Apr 18 20:45:33 2023 +0200
files: thys/PAPP_Impossibility/papp_impossibility.ML
description:
alternative version by Manuel Eberl, following the approach of
Isabelle2022/src/HOL/Tools/sat.ML with one big hyp;

changeset: 13552:351b7b576892
user: wenzelm
date: Tue Apr 18 20:58:25 2023 +0200
files: thys/PAPP_Impossibility/papp_impossibility.ML
description:
eliminated pointless parallelism: approx. 1.3s as plain sequential ML;

Here are some timings:

Finished PAPP_Impossibility (0:03:23 elapsed time, 0:04:51 cpu time, factor 1.43)

Not bad for something that appeared like a huge monster proof.

Makarius


isabelle-dev mailing list
isabelle-dev@in.tum.de
https://mailman46.in.tum.de/mailman/listinfo/isabelle-dev


Last updated: Mar 04 2024 at 10:08 UTC