Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Failing afp-2015 build


view this post on Zulip Email Gateway (Aug 22 2022 at 11:51):

From: Lars Hupel <hupel@in.tum.de>
Dear list,

while trying to run some performance tests on potential future Jenkins
servers, I've encountered a – in my opinion – severe problem. The setup is
as follows:

$ isabelle version
Isabelle2015: May 2015

$ uname -a
Linux lars-demon 4.2.3-1-ARCH #1 SMP PREEMPT Sat Oct 3 18:52:50 CEST 2015
x86_64 GNU/Linux

$ hg id # afp-2015 repository
770bf7fe6d2b tip

If I now run the session "ConcurrentGC", this happens:

$ isabelle build -bv -d thys -o threads=1 ConcurrentGC
[...]
ML_PLATFORM="x86-linux"
ML_HOME="/opt/Isabelle2015/contrib/polyml-5.5.2-3/x86-linux"
ML_SYSTEM="polyml-5.5.2"
ML_OPTIONS="-H 500"
[...]
*** A total of 104 subgoals...
*** At command "by" (line 130 of "~/proj/afp-2015/thys/ConcurrentGC/TSO.thy")
Unfinished session(s): ConcurrentGC

Exactly the same error occurs with the following settings:

ML_PLATFORM="x86_64-linux"
ML_HOME="/opt/Isabelle2015/contrib/polyml-5.5.2-3/x86_64-linux"
ML_SYSTEM="polyml-5.5.2"
ML_OPTIONS="-H 2000"

I have managed to reproduce this behaviour on three different machines.
However, now comes the worrying part: If I load the failing theory in
Isabelle/jEdit, it goes through just fine! Same goes for building without
"-o threads=1" *. This is a real problem, because I was actually planning
to run each session of the AFP with "-o threads=1". Can anybody reproduce
this behaviour? I'm scared ...

Best
Lars

view this post on Zulip Email Gateway (Aug 22 2022 at 11:51):

From: Lars Hupel <hupel@in.tum.de>

Update: Build has finished on my quad-core Core i7.

Timing ConcurrentGC (4 threads, 3055.987s elapsed time, 7458.113s cpu
time, 643.593s GC time, factor 2.44)
Finished ConcurrentGC (0:51:48 elapsed time, 2:05:59 cpu time, factor 2.43)
Finished at Wed Nov 11 21:58:23 CET 2015
0:51:52 elapsed time, 2:06:15 cpu time, factor 2.43

It is still running on an 8 core "cloud" machine, but that one has also
progressed beyond "TSO".

view this post on Zulip Email Gateway (Aug 22 2022 at 11:52):

From: Peter Lammich <lammich@in.tum.de>
We have analysed the problem:
It appears in both, Isabelle2015 and Isabelle 40ca618e1b2d.

The behaviour of PARALLEL_GOALS may differ depending on whether there
is one or more threads. It seems nowhere documented what are the
constraints such that the behaviour for the parallel and sequential case
matches, except a NEWS-file entry from 2011, which talks about no
schematic variables and "uniform" behaviour.

The "vcg_jackhammer" tactic implemented in ConcurrentGC seems to run in
one of the cases with different behaviour, and thus triggers the
observed error.

It is unclear whether the problem lies in the jackhammer-tactic or the
PARALLEL_GOALS combinator. In any case, the current state is very
unfortunate, as proof methods may behave differently depending on the
number of threads.

In my opinion, the PARALLEL_GOALS-tactics should completely hide the
effects of parallelism from the user, even if this means some slowdown
in the sequential case (Note: Removing the optimization for the
sequential case in PARALLEL_GOALS, and using the same map-reduce
approach with retrofitting as in the parallel case, fixes the problem in
ConcurrentGC)

view this post on Zulip Email Gateway (Aug 22 2022 at 11:52):

From: Makarius <makarius@sketis.net>
Using the dangerous word "fix" means there is a lack of information what
is really going on. Maybe the author of the "vcg_jackhammer" proof method
can look again and explain the situation.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 11:52):

From: Peter Lammich <lammich@in.tum.de>
"Fix" was meant in the sense "make disappear this very instance of the
problem". Actually the described change aligns the behaviour for
threads=1 and threads>1, sacrificing the optimization that is currently
there for threads=1.

However, to general questions remain:

1) Is it intentional/desirable to have the behaviour of PARALLEL_GOALS
depend on the number of threads:
Pro: This allows the current optimization for threads=1
Con: This makes debugging of tactics/proofs which use PARALLEL_GOALS
harder, as one has to test for threads=1 and threads>1 separately.

2) Is there any documentation on what are the preconditions of
PARALLEL_GOALS such that it shows no diverging behaviour in the
parallel/sequential case? This information is required by anyone who
wants to use PARALLEL_GOALS in a thread-safe way.

view this post on Zulip Email Gateway (Aug 22 2022 at 11:52):

From: Lars Hupel <hupel@in.tum.de>

Using the dangerous word "fix" means there is a lack of information what
is really going on. Maybe the author of the "vcg_jackhammer" proof
method can look again and explain the situation.

Even if the AFP entry gets changed, we have a broader problem here. Let
me explain.

As Peter already said, there is no mention of "PARALLEL_GOALS" (or
related combinators) in the official documentation. The only relevant
mention we found is the following NEWS entry:

Additionally, there are only few occurrences in the visible Isabelle
universe: The distribution uses it in "clasimp" and "simp", and there
are only two more occurrences in the AFP which are not derived from what
is used in the distribution ("AWN" and "Slicing") *. Hence, the usual
heuristics of looking for canonical examples in the sources to figure
out how it should be used fails. (Also, both "AWN" and "Slicing" build
fine in single-threaded mode.)

Looking just at the implementation, it becomes obvious that the
sequential case stems from a – what I would call – "performance
optimization"; that is, if there is only one thread, there is no point
in setting up auxiliary goal states, so the inner tactic can proceed
undisturbed.

Now, there may very well be invariants which a tactic should fulfill in
order to make both the sequential and the parallel execution
observationally equivalent. My point is that this invariant – in
contrast to many others – are not checked in this combinator, which
allows for inconsistencies to sneak into the official AFP sources. Even
if the AFP entry gets edited in such a way that the usage of the
"PARALLEL_GOALS" combinator is canonical, there might be future
non-canonical uses which are difficult to detect (needless to say, ML
tactic code – while often being necessary – is hard to understand). In
fact, the are only two ways I can think of: Time-intensive review of ML
code or running entries in both single- and multi-threaded mode.

I'm a fairly young user on the general Isabelle time scale, but as far
as I remember, additional checks have been introduced quite often to
prevent the user from doing things which might "just work" but are not
intended to happen (e.g. context discipline). In my opinion the best way
to proceed is to both edit the AFP entry, and to either remove the
optimization in the combinator or add additional robustness checks. In
the case of the "ParallelGC" session, these two measures are independent
of each other.

Cheers
Lars

view this post on Zulip Email Gateway (Aug 22 2022 at 11:54):

From: Lars Hupel <hupel@in.tum.de>
Hello Peter,

Why would you want that?

I don't think we want that; I believe it is by accident. From what
you're saying it sounds like your tactic is indeed "uniform", so should
be acceptable for use with PARALLEL_GOALS.

(What is the point of running single-threaded these days?)

I only discovered that as part of my investigations into how to reliably
build the AFP. There are essentially two axes for parallelism: Internal
to a session (multi-threaded, e.g. parallel proofs, parallel theories)
and between sessions (multi-process, i.e. independent AFP sessions can
be run in parallel). Gerwin and I suspect that it would be most
efficient to exploit the latter on the cost of the former, since the AFP
dependency graph is quite sparse.

Cheers
Lars

view this post on Zulip Email Gateway (Aug 22 2022 at 11:55):

From: Makarius <makarius@sketis.net>
On Thu, 12 Nov 2015, Lars Hupel wrote:

As Peter already said, there is no mention of "PARALLEL_GOALS" (or
related combinators) in the official documentation.

I was actually under the impression that a small paragraph about that was
already in the relevant manuals, but that is not the case. It will happen
eventually. (I am committed to keep the important reference manuals
up-to-date, but the massive amount of material is already a problem; many
people just ignore the manuals, either passively or actively.)

Anyway, official documentation is also the ML source and the cumulative
NEWS file.

The only relevant mention we found is the following NEWS entry:

Doing an exhaustive hypersearch on NEWS in Isabelle2015 provides the
following bits of information:

Isabelle2009-1 (December 2009)

* PARALLEL_CHOICE and PARALLEL_GOALS provide basic support for
parallel tactical reasoning.

Isabelle2011-1 (October 2011)

* Refined PARALLEL_GOALS tactical: degrades gracefully for schematic
goal states; body tactic needs to address all subgoals uniformly.

Isabelle2015 (May 2015)

* Tactical PARALLEL_ALLGOALS is the most common way to refer to
PARALLEL_GOALS.

This provides a glimpse on the typical bottom-up construction of new
Isabelle concepts. We start with something we know already (Par_List.map)
and work our way through further system structures, e.g. tactical
reasoning with goal and subgoal addressing.

You probably know that the Larry-Paulson-goal-state is great in many
respects, but also more concrete than semantically intended. This means
ML tactics and Isar proof methods need to be "well-behaved" in certain
ways. This is quite explicitly documented in the "implementation" manual,
but I still see violations routinely in Isabelle applications. The proof
parallelization occasionally changes the representation of goal states,
and tactic expressions seen in the wild break down.

The present situation seems to be much less spectacular, though.

For the PARALLEL_GOALS, the October 2011 NEWS snipped says "body tactic
needs to address all subgoals uniformly". This means it needs to ignore
the accidental number of subgoals that it sees, and operate on all
subgoals that happen to be there, e.g. like simp_all.

Since it already turned out a bit impractical to take this extra condition
into account in applications, and all concrete uses were using ALLGOALS,
the May 2015 version formalizes that situation by a new PARALLEL_ALLGOALS
combinator and makes it "the most common way" in the NEWS.

The ConcurrentIMP session was added before the Isabelle2015 release, where
PARALLEL_ALLGOALS was not yet available as canonical shortcut. Thus it
fell into the gap of the historical confusion of what PARALLEL_GOALS
really means.

So this is just a routine case of updating existing AFP entries. What is
less routine is the total run-time of that session, which makes
maintenance more difficult. I have already reduced it a lot by using the
newer 'subgoal' Isar command, but that is for the next Isabelle release.

I will standardize all occurrences, but only in afp-devel.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 11:55):

From: Makarius <makarius@sketis.net>
The behaviour and the name should fit together. A tactical that is a
parallel version of ALLGOALS should be called like that. If PARALLEL_GOALS
had been like PARALLEL_ALLGOALS from the outset, I would have also used
that name.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 11:56):

From: Peter Gammie <peteg42@gmail.com>
I’m not doing anything clever here - my use of PARALLEL_GOALS was intended to have the same semantics as ALLGOALS (IIRC - it’s been a while). As far as I’m concerned it’s just an optimisation, albeit one that was somewhat essential to invariant discovery.

The goals produced by vcg_jackhammer should be completely independent (no schematics etc).

I never tested the single-threaded mode, and don’t see the point in it having an observably-distinct (wrt goal states etc.) semantics from the multi-threaded one. Why would you want that? (What is the point of running single-threaded these days?)

Thanks to everyone for the diagnoses.

cheers,
peter (resent at Larry’s request)

view this post on Zulip Email Gateway (Aug 22 2022 at 11:57):

From: Peter Gammie <peteg42@gmail.com>
On 17 Nov 2015, at 23:03, Makarius <makarius@sketis.net> wrote:

The ConcurrentIMP session was added before the Isabelle2015 release, where PARALLEL_ALLGOALS was not yet available as canonical shortcut. Thus it fell into the gap of the historical confusion of what PARALLEL_GOALS really means.

When would one wish to use PARALLEL_GOALS now? (Why not just change its behaviour to that of PARALLEL_ALLGOALS?)

So this is just a routine case of updating existing AFP entries. What is less routine is the total run-time of that session, which makes maintenance more difficult. I have already reduced it a lot by using the newer 'subgoal' Isar command, but that is for the next Isabelle release.

Sorry about that.

I had in mind to revisit it now-ish, but you’ve beaten me to it. The idea was to turn vcg_jackhammer into something case-like, so the larger proofs would look like:

proof vcg_jackhammer
case (statement_label x y z) then show ?thesis
< existing apply gloop >

We talked about this earlier. I now have time to chase up your pointers. I hoped this would yield the kind of parallelism your “subgoal”s do now, with the advantage that it’d be more maintainable. Perhaps it’s a wash at this stage.

thanks,
peter

view this post on Zulip Email Gateway (Aug 22 2022 at 11:57):

From: Makarius <makarius@sketis.net>
On Tue, 17 Nov 2015, Lars Hupel wrote:

(What is the point of running single-threaded these days?)

I only discovered that as part of my investigations into how to reliably
build the AFP. There are essentially two axes for parallelism: Internal
to a session (multi-threaded, e.g. parallel proofs, parallel theories)
and between sessions (multi-process, i.e. independent AFP sessions can
be run in parallel).

Sequentialism is indeed a very atypical situation, more than 10 years of
multicore hardware becoming mainstream.

In the various different isatest configurations for main Isabelle (not
AFP) we do indeed test normal situations, like threads=4 or threads=8,
alongside with abnormal ones like threads=1 or skip_proofs=true.

Such tests provide empirical evidence that the "physics" of a
highly-complex parallel ML and proof engine comes out as reliably
"mathematics", i.e. results are determined. Asking for determinism in
every respect is a bit much.

Gerwin and I suspect that it would be most efficient to exploit the
latter on the cost of the former, since the AFP dependency graph is
quite sparse.

What always happens in such situations is that the total runtime converges
to the longest sequential task. That used to be JinjaThreads, now it is
AODV or ConcurrentGC. These take many hours CPU time. With a good
multicore machine of 2015, it should be possible to run all of Isabelle +
AFP in approximately 1h elapsed time, maybe even less after some tweaking.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 11:59):

From: Makarius <makarius@sketis.net>
The relevant changeset for Isabelle_Meta_Model is AFP/c4abbb09645a, but it
merely conflates proper use of (PARALLEL_GOALS o ALLGOALS) to
PARALLEL_ALLGOALS.

The remaining change for ConcurrentGC is included here -- I leave it to
the author/maintainer to apply it on afp-2015 and/or afp-devel. (E.g.
with "hg import".)

That is the result of staring a rather long time at not-quite-canonical
tactic definitions, to figure out how subgoal addressing is used.
Non-standard indentation and overlong source lines make it more difficult
to read the text than necessary.

Earlier on this thread, I claimed that there was merely a confusion of
PARALLEL_GOALS vs. the newer PARALLEL_ALLGOALS, but basic goal addressing
without parallelism was already unclear. Here are relevant paragraphs
from the "implementation" manual:

Operating on a particular subgoal means to replace it by an interval
of zero or more subgoals in the same place; other subgoals must not
be affected, apart from instantiating schematic variables ranging
over the whole goal state.

A common pattern of composing tactics with subgoal addressing is to
try the first one, and then the second one only if the subgoal has
not been solved yet. Special care is required here to avoid bumping
into unrelated subgoals that happen to come after the original
subgoal. Assuming that there is only a single initial subgoal is a
very common error when implementing tactics!

...

Some of these conditions are checked by higher-level goal
infrastructure (\secref{sec:struct-goals}); others are not checked
explicitly, and violating them merely results in ill-behaved tactics
experienced by the user (e.g.\ tactics that insist in being
applicable only to singleton goals, or prevent composition via
standard tacticals such as @{ML REPEAT}).

Testing this minor change required exceedingly long time, just to get 4-5
attempts work out eventually.

Are there deeper reasons for these proofs to be so slow? The elapsed time
of ConncurrentIMP is about that of all of the rest of AFP taken together.

As another peepwhole optimization, it should be possible to conflate

apply m1
apply m2
done

to

by m1 m2

This is forked in Isabelle/jEdit. Unlike batch builds, more complex
proofs are still not forked in the Prover IDE after all these years.

Makarius
ch

view this post on Zulip Email Gateway (Aug 22 2022 at 11:59):

From: Makarius <makarius@sketis.net>
(Back to this thread, which is not yet closed in all its sub-threads.)

Using the 'subgoal' command of future Isabelle2016 did indeed make a
significant difference some months ago. Nonetheless there seems to be
room for more efficiency of the basic proof tools used here.

Just from the English, what does "Perhaps it’s a wash at this stage."
meand?

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 12:00):

From: Makarius <makarius@sketis.net>
On Thu, 19 Nov 2015, Makarius wrote:

The remaining change for ConcurrentGC is included here -- I leave it to the
author/maintainer to apply it on afp-2015 and/or afp-devel. (E.g. with "hg
import".)

That is the result of staring a rather long time at not-quite-canonical
tactic definitions, to figure out how subgoal addressing is used.

That is almost 3 weeks old, and nothing has happened yet. I am routinely
on afp-devel, but not the official release branches. Ultimately, it is
the job of the AFP authors to maintain their material.

Testing this minor change required exceedingly long time, just to get 4-5
attempts work out eventually.

Are there deeper reasons for these proofs to be so slow? The elapsed time of
ConncurrentIMP is about that of all of the rest of AFP taken together.

As another peepwhole optimization, it should be possible to conflate

apply m1
apply m2
done

to

by m1 m2

This is forked in Isabelle/jEdit. Unlike batch builds, more complex proofs
are still not forked in the Prover IDE after all these years.

This is also something that can be easily done to improve the
maintainability of ConcurrentGC.

Makarius
ch

view this post on Zulip Email Gateway (Aug 22 2022 at 12:00):

From: Peter Gammie <peteg42@gmail.com>
On 7 Dec 2015, at 22:19, Makarius <makarius@sketis.net> wrote:

That is almost 3 weeks old, and nothing has happened yet. I am routinely on afp-devel, but not the official release branches. Ultimately, it is the job of the AFP authors to maintain their material.

Makarius, I am on holidays presently and will not get to this until February 2016 or later.

You have previously pushed substantial changes to ConcurrentGC without discussing them with me at all, and I don’t see why this should be any different. If it is important to you, go ahead and commit it. It should be clear that I simply want those tactics to work - I have no deep insight into the Isabelle tactic API, for otherwise I would have got it right before now. If it were my project, I would take this as an opportunity to make the Isabelle tactic API more robust, so people like me don’t cause maintenance headaches for people like you.

As for runtimes: Florian’s email on isabelle-dev 2015-11-26 (why is this thread I’m responding to on isabelle-users? Aren’t we talking about the development version of the AFP for the upcoming Isabelle2016 release? brain explodes):

For the record:

Running on host lxbroy10
isabelle: fc53fbf9fe01 tip
afp: 835c7e115feb tip
Running ConcurrentGC ...
Finished ConcurrentGC (1:08:48 elapsed time, 5:15:14 cpu time, factor 4.58)
1:21:59 elapsed time, 5:53:13 cpu time, factor 4.30

Five hours of CPU seems excessive. Is this machine ancient? I don’t remember it being this bad when I submitted it to the AFP.

Just from the English, what does “Perhaps it’s a wash at this stage." meand?

As for English idioms, I would start by asking Google; it seems the phrase “it’s a wash” has a few interpretations. I humbly suggest you avoid using the passive voice as it tends to come across as aggressive. In many cases it distracts me and other subscribers from the often valid points you are making.

thanks,
peter

view this post on Zulip Email Gateway (Aug 22 2022 at 12:00):

From: Tobias Nipkow <nipkow@in.tum.de>
On 08/12/2015 00:15, Jeremy Dawson wrote:

On 08/12/15 02:19, Makarius wrote:

Ultimately, it is the job of the AFP authors to maintain their material.

That is indeed what the AFP web pages say.

Huh? When I was urged to convert all my proofs to Isar so that they could go
into the AFP, I was told

(quote)
Once your theories are in the AFP, every developer who makes a
change that breaks any of the theories in the AFP (or the Isabelle
distribution) is responsible for fixing it, which is usually not too
difficult, since the developer knows what kind of changes he has made.
(end quote)

That is the practice in 99% of the cases.

Tobias
smime.p7s

view this post on Zulip Email Gateway (Aug 22 2022 at 12:00):

From: Makarius <makarius@sketis.net>
On Tue, 8 Dec 2015, Peter Gammie wrote:

You have previously pushed substantial changes to ConcurrentGC without
discussing them with me at all, and I don’t see why this should be any
different. If it is important to you, go ahead and commit it.

I have no problems to push that on afp-devel, but thus it won't be on
afp-2015. The start of the thread (and its subject line) is about
afp-2015: Lars Hupel reported problems that are relevant for that release
branch, for continuous testing in sequential mode.

Isabelle and AFP are not just a sink for material. It is continuously
crunched and continuosly improved, especially for performance and
robustness.

It should be clear that I simply want those tactics to work - I have no
deep insight into the Isabelle tactic API, for otherwise I would have
got it right before now. If it were my project, I would take this as an
opportunity to make the Isabelle tactic API more robust, so people like
me don’t cause maintenance headaches for people like you.

The tactic API is from 1989 and cannot be changed without changing almost
everything of the Paulson goal state representation (which was a big asset
in its time, and still has many benefits that other ITP systems lack).

The "implementation" manual is quite explicit about what the ML type
tactic (and Proof.method) mean semantically. It is OK to ignore manuals
in the first round. It is not OK to complain about the existence of
manuals.

Running on host lxbroy10
isabelle: fc53fbf9fe01 tip
afp: 835c7e115feb tip
Running ConcurrentGC ...
Finished ConcurrentGC (1:08:48 elapsed time, 5:15:14 cpu time, factor 4.58)
1:21:59 elapsed time, 5:53:13 cpu time, factor 4.30

Five hours of CPU seems excessive. Is this machine ancient? I don’t
remember it being this bad when I submitted it to the AFP.

The machine is ancient, but ConcurrentGC has always been that slow.
Before I made some changes with 'subgoal' it was even slower -- approx. by
a factor 1.5.

Just from the English, what does “Perhaps it’s a wash at this stage."
meand?

As for English idioms, I would start by asking Google; it seems the
phrase “it’s a wash” has a few interpretations. I humbly suggest you
avoid using the passive voice as it tends to come across as aggressive.
In many cases it distracts me and other subscribers from the often valid
points you are making.

Sorry, I don't know English slang, only international scientific pidgin.

So far the agressions has been mostly on the side of the massive
ConcurrentGC session, which has sucked up quite a lot of resources of
everybody involved in ongoing Isabelle + AFP maintenance.

It is still unclear to me, why these proofs are so slow. You as the
author should know.

Makarius

view this post on Zulip Email Gateway (Aug 22 2022 at 12:01):

From: Gerwin Klein <Gerwin.Klein@nicta.com.au>
Let’s make the change in afp-devel and see how much improvement we get over time. It won’t be visible until the next release, but afp-devel is the side that is run more often in any case.

ch.txt was meant to apply to afp-devel, correct? I can have a look at that.

Cheers,
Gerwin

ps: “it’s a wash” means “comes out to roughly the same thing"


The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

view this post on Zulip Email Gateway (Aug 22 2022 at 12:01):

From: Makarius <makarius@sketis.net>
Yes.

Makarius


Last updated: Apr 19 2024 at 04:17 UTC