Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Parallelism in Isabelle 2009-1


view this post on Zulip Email Gateway (Aug 18 2022 at 15:03):

From: Thomas Sewell <Thomas.Sewell@nicta.com.au>
Hey Isabelle users.

I've been playing with Isabelle 2009-1's parallelism features for a
while now, and have come to the conclusion that I can never get quite
what I want from them. There are two problems. On my desktop machine at
work, I can't safely enable parallelism. On my desktop machine at home,
I can't completely disable parallelism.

The first problem is that the L4.verified proof repository I work on is
huge (tens of thousands of lemmas and hundreds of thousands of lines of
proof). This exposes a problem with parallel proof checking. CPU usage
looks good to begin with but then our systems very rapidly run out of
memory and start thrashing. I suspect the problem here is that for every
proof saved for later checking the relevant context needs to saved as
well. Our contexts get quite large, so the theory-parsing thread doesn't
have to get far ahead of the proof-checking threads for memory to run
out. The solution to this is to turn off parallel proof checking.

Furthermore, although Isabelle 2009-1 loads theories in parallel this
does not seem to translate to CPU parallelism. It is clear from the log
chatter that multiple theories are being loaded at once, however, when
parallel proofs are disabled CPU usage remains at one core's worth.
Running UNIX top and showing thread decompositions (press "H") reveals
that there are multiple polyml threads running but their CPU usage
always adds to exactly one core. It looks a lot like there's a lock
somewhere all the threads need to run. This prevents me getting any
parallel execution on my work machine.

Finally, on my machine at home, which seems to act up in the context of
parallelism, I can't disable it. I can turn off parallel proof checking
and set the maximum thread count to 1. UNIX top confirms that this
reduces the number of polyml threads created from 7 to 2, which then
share the CPU between them in exactly the same way. This prevents me
disabling real parallelism on my home machine. I can still run a busy
loop on the other core, which seems to hide the mysterious issues.

Do other users see the same behaviour? Do the developers know what's
going on? Could it be easily fixed?

Yours,
Thomas.

view this post on Zulip Email Gateway (Aug 18 2022 at 15:03):

From: Makarius <makarius@sketis.net>
On Fri, 9 Apr 2010, Thomas Sewell wrote:

The first problem is that the L4.verified proof repository I work on is
huge (tens of thousands of lemmas and hundreds of thousands of lines of
proof). This exposes a problem with parallel proof checking. CPU usage
looks good to begin with but then our systems very rapidly run out of
memory and start thrashing. I suspect the problem here is that for every
proof saved for later checking the relevant context needs to saved as
well. Our contexts get quite large, so the theory-parsing thread doesn't
have to get far ahead of the proof-checking threads for memory to run
out. The solution to this is to turn off parallel proof checking.

In a low memory situation, you have two options to play with:

* Multithreading.max_threads (usedir option -M)
* Goal.parallel_proofs (usedir option -q)

See also section 3.4 in the Isabelle System manual for some further
explanations. In particulay, -q1 vs. -q2 might sometimes help.

Since your application is rather big, I would say the real solution is to
install more memory. As a rule of thumb, you need to scale up memory with
the number of cores. Less than 2 GB per core is unlikely to work here.
Since the total heap size of Poly/ML is limited to 2-3 GB on 32bit, you
might have to switch to 64bit by editing the ML settings accordingly.
There is an example in Isabelle2009-1/etc/settings. Multiply the memory
by 1.5 or even 2 for 64bit mode.

Furthermore, although Isabelle 2009-1 loads theories in parallel this
does not seem to translate to CPU parallelism. It is clear from the log
chatter that multiple theories are being loaded at once, however, when
parallel proofs are disabled CPU usage remains at one core's worth.
Running UNIX top and showing thread decompositions (press "H") reveals
that there are multiple polyml threads running but their CPU usage
always adds to exactly one core. It looks a lot like there's a lock
somewhere all the threads need to run. This prevents me getting any
parallel execution on my work machine.

The degree of parallelism in typical theory development graphs is
surprisingly low. Nonetheless, you should get some speedup, unless the
graph is all linear.

If CPU usage is really just 100% there might be some ML tools using
something like setmp_CRITICAL internally. The official Isabelle
distribution is essentially free from such exclusive execution, but maybe
you have your own tools. Or maybe it is just some critical wrapper around
the main use_thys invocation in ROOT.ML of the session (only relevant for
batch mode).

Finally, on my machine at home, which seems to act up in the context of
parallelism, I can't disable it. I can turn off parallel proof checking
and set the maximum thread count to 1. UNIX top confirms that this
reduces the number of polyml threads created from 7 to 2, which then
share the CPU between them in exactly the same way. This prevents me
disabling real parallelism on my home machine. I can still run a busy
loop on the other core, which seems to hide the mysterious issues.

I have just tried ISABELLE_USEDIR_OPTIONS="-M 1" on my 2-core laptop and
it works as expected: there are only 2 threads, the main ML thread
(typically 80% CPU time) and the GC thread (20% CPU time). Total CPU
usage is < 50% accounting for the 2 cores, i.e. this is really single-core
mode. The ML thread and GC thread are alternating each other.

Makarius


Last updated: Mar 28 2024 at 12:29 UTC