[isabelle] Isabelle2013-2-RC3 ATP e_par acid test · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Isabelle2013-2-RC3 ATP e_par acid test

Email Gateway (Aug 19 2022 at 13:14):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
Makarius,

The short story is that I can reliably get non-terminating processes (or
processes that can't be shut down, and that will run for a long time),
but so far, I can only do that with the ATP e_par, and when it's started
with a bunch of other ATPs.

The theorem I use for the test has to be put it at the bottom of a
theory, so that Sledgehammer can find some relevant facts.

The test is to start Sledgehammer, and then terminate it prematurely by
deleting the "r" in "Sledgehammer", sometimes soon, and sometimes later.
I actually have to let e_par run for a while to get the right conditions
for it to not terminate.

I didn't spend a lot of time experimenting with the command I show next,
which doesn't use e_par, but it uses most every ATP that's available to
me, and I haven't seen any problems from prematurely terminating the
Sledgehammer command. This is the command:

ML {* Multithreading.max_threads_value () *}
ML {* Multithreading.max_threads := 300 *}
ML {* Multithreading.max_threads_value () *}

theorem "False = True"
sledgehammer [minimize=smart,preplay_timeout=10,timeout=60,verbose=true,
isar_try0=true,isar_proofs=smart,provers="
remote_vampire metis remote_satallax z3_tptp remote_e smt
remote_e_tofof spass remote_e_sine e z3 yices

remote_snark remote_iprover remote_z3 remote_waldmeister
spass_poly
remote_cvc3 remote_agsyhol remote_leo2 remote_iprover_eq
leo2 cvc3
"]
oops

From here, two key points are that e_par starts up 12 eprover.exe
processes regardless of what max_threads is set to, that the number and
order of ATPs run with it affect things, and "max_thread := 4" is what
sets up the particular sequence I list below.

I give the Isar command below, but I work it like this:

1) I start the command.
2) Lots of eprover processes will be started and terminated, several times.
3) At about 40 seconds using my CPU, the ATP "leo" will be running with
12 eprover.exe processes, during a time when processes aren't being shut
down and started (shown by green and red lines using
http://systemexplorer.net/ )
4) I delete the "r" in "Sledgehammer".
5) The processes keep running much long than what I'm willing to wait.
(I have waited up to 15 minutes to see if they would terminate, and they
didn't.)
6) I exit the PIDE. Processes are terminated except for the ATP related
processes. I attach a screen shot showing the many non-terminating
processes.

The command is this:

ML {* Multithreading.max_threads_value () *}
ML {* Multithreading.max_threads := 4 *}
ML {* Multithreading.max_threads_value () *}

theorem "False = True"

sledgehammer [minimize=smart,preplay_timeout=10,timeout=60,verbose=true,
isar_try0=true,isar_proofs=smart,provers="
e_par e leo2 z3 metis z3_tptp spass yices cvc3 smt spass_poly
"]

oops

The steps listed above don't always work, but they probably work more
than 50% of the time.

In the PIDE options, the default for "Threads" is "0", which is what
controlled my "max_thread" until I started changing "max_threads" with
the ML command, which affected me being able to get the sequence of
events to happen as I described above. The value "max_threads := 4" gets
the same results as the "Threads" set to "0". I have a 4 core CPU with
no hyperthreading.

This is no problem for me. The ATP e_par actually terminates much of the
time, and I haven't been using it because of the large number of
processes that it starts.

Regards,
GB
131204_non-term-processes.png

Email Gateway (Aug 19 2022 at 13:14):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
I don't actually have to terminate the Sledgehammer command. This worked
three times in a row:

1) Log out and back into my Windows account.
2) Run the command I showed with max_threads set to 4.
3) It gets to where "leo" and 12 "eprover.exe" processes are running,
and they don't timeout like they should.

I think it's related to the number of threads, but not completely. I can
change max_threads to 3, and e_par times out like it should. I change it
back to 4, and it can time out like it should.

I did the log out/log in sequence with max_threads set to 3, and it did
like with 4, except "leo" wasn't running, but it didn't do it twice in a
row. The number 4 seems worse, but 3 fails, and every computer might be
different.

Regards,
GB

Email Gateway (Aug 19 2022 at 13:14):

From: Makarius <makarius@sketis.net>
On Wed, 4 Dec 2013, Gottfried Barrow wrote:

The short story is that I can reliably get non-terminating processes (or
processes that can't be shut down, and that will run for a long time),
but so far, I can only do that with the ATP e_par, and when it's started
with a bunch of other ATPs.

This is the first time I learn about the existence of e_par. It seems to
be an add-on script to E prover by Josef Urban. Maybe Jasmin can say
something about the degree of supportedness within the Sledgehammer suite.

My general impression is that there are many experimental add-ons that are
not widely used nor tested on the usual platforms.

I didn't spend a lot of time experimenting with the command I show next,
which doesn't use e_par, but it uses most every ATP that's available to
me, and I haven't seen any problems from prematurely terminating the
Sledgehammer command. This is the command:

ML {* Multithreading.max_threads_value () *}
ML {* Multithreading.max_threads := 300 *}
ML {* Multithreading.max_threads_value () *}

theorem "False = True"
sledgehammer [minimize=smart,preplay_timeout=10,timeout=60,verbose=true,
isar_try0=true,isar_proofs=smart,provers="
remote_vampire metis remote_satallax z3_tptp remote_e smt
remote_e_tofof spass remote_e_sine e z3 yices

remote_snark remote_iprover remote_z3 remote_waldmeister
spass_poly
remote_cvc3 remote_agsyhol remote_leo2 remote_iprover_eq leo2
cvc3
"]
oops

From here, two key points are that e_par starts up 12 eprover.exe processes
regardless of what max_threads is set to, that the number and order of ATPs
run with it affect things, and "max_thread := 4" is what sets up the
particular sequence I list below.

300 threads are quite a lot, but I suggested to go towards and beyond the
limits of what is feasible :-)

The e_par implementation ($E_HOME/runepar.pl) has its own process
management and allocation of CPU resources. It does not take the number
of ML threads into account. Looking a bit through the script, I see
various potential weaknesses and non-portabilities. E.g. there is ulimit,
which only works reliably on various brands of Unix, not the generic
quasi-POSIX of Isabelle that includes Cygwin. It is also unclear if its
internal signal handling is really robust (e.g. the Cleanup does not wait
for the process farm.)

A general note on Multithreading.max_threads: changing that in Isabelle/ML
might lead to some surprises, because the PIDE protocol will update its
value after some change of properties, using the value that is given in
Isabelle/Scala, which is the same that you see in Isabelle/jEdit in the
dialog "Plugin Options / Isabelle / General / Threads".

ML {* Multithreading.max_threads_value () *} is fine to check what
Isabelle/ML is presently using.

I give the Isar command below, but I work it like this:

1) I start the command.
2) Lots of eprover processes will be started and terminated, several times.
3) At about 40 seconds using my CPU, the ATP "leo" will be running with 12
eprover.exe processes, during a time when processes aren't being shut down
and started (shown by green and red lines using http://systemexplorer.net/ )

Does that mean you have a local installation of leo? I've never looked
at it myself, and don't know where to get it. It might have once again
its very own process management done in OCaml.

4) I delete the "r" in "Sledgehammer".
5) The processes keep running much long than what I'm willing to wait. (I
have waited up to 15 minutes to see if they would terminate, and they
didn't.)
6) I exit the PIDE. Processes are terminated except for the ATP related
processes. I attach a screen shot showing the many non-terminating
processes.

We've had that already: switching off the light with the hammer --- by
shutting down the outermost PIDE application --- does not really help.
It can leave processes behind at the bottom of the implicit hierarchy.

POSIX processes are relatively difficult to control systematically. One
needs to look at particular implementations, to see that they are within
reason of fork, wait, signal handling etc.

This is no problem for me. The ATP e_par actually terminates much of the
time, and I haven't been using it because of the large number of
processes that it starts.

Indeed. It seems to be an experiment to allocate as many CPU cores as
feasible, with different strategies for E tried in parallel.

Makarius

Email Gateway (Aug 19 2022 at 13:14):

From: Jasmin Christian Blanchette <jasmin.blanchette@gmail.com>
Yes, "e_par" should be considered experimental. Beyond the technical issues, it didn't perform well on our benchmarks.

Jasmin

Email Gateway (Aug 19 2022 at 13:15):

From: Gottfried Barrow <gottfried.barrow@gmx.com>
On 12/5/2013 7:59 AM, Makarius wrote:

300 threads are quite a lot, but I suggested to go towards and beyond
the limits of what is feasible :-)

I have about 70 processes running normally, and with threads set to 300,
only a maximum of about 140 processes are ever running for the "False =
True" test. An ATP running as a thread sometimes starts up lots of
processes of its own. With threads set to 4, a max of about 90 processes
will be running on my computer.

I think I'll try threads set to 12 for a while, even though you've said
it should be set to the number of physical cores. One of the new remote
ATPs that came with Isablle2013-1 is "remote_agsyhol", which is finding
proofs.

I've added that to my "first pass ATPs", so here are the 12 I use for a
first pass. Setting threads to 12 lets all of the them start (or maybe
it needs to be more than 12), and many of them will sometimes timeout fast:

remote_vampire metis remote_satallax z3_tptp remote_e remote_agsyhol
remote_e_tofof spass remote_e_sine e z3 yices

A general note on Multithreading.max_threads: changing that in
Isabelle/ML might lead to some surprises, because the PIDE protocol
will update its value after some change of properties, using the value
that is given in Isabelle/Scala, which is the same that you see in
Isabelle/jEdit in the dialog "Plugin Options / Isabelle / General /
Threads".

ML {* Multithreading.max_threads_value () *} is fine to check what
Isabelle/ML is presently using.

Before, I didn't see max_threads_value change after changing "Threads",
but now I see that if I save the file, max_threads_value will display
what I set "Threads" to in the options.

Does that mean you have a local installation of leo? I've never
looked at it myself, and don't know where to get it. It might have
once again its very own process management done in OCaml.

Yes. I have a local install. Your colleague must have forgot to mention
it to you:

http://www.cl.cam.ac.uk/~lp15/Grants/leo2.html

You may be able to ask him about it in Houston. Texas is big, so they
say. You wouldn't want to get lost. With your good English grammar,
Texans might not be able to understand you.

http://www.nasaformalmethods.org/?page_id=104

I haven't had to do anything special to keep leo2 running since
Isabelle2012, unlike satallax, which doesn't run anymore because I
haven't downloaded what's needed for Cygwin.

For anyone interested, there are a number of ATPs which can be run
locally, that aren't part of the distribution.

Vampire, http://www.vprover.org/ (The version is up to 3.0, but the
downloads haven't been available for a long time, and local vampire
won't run on Windows)

Leo2, http://www.ags.uni-sb.de/~leo/ (the link is not responding for me
right now)

Satallax, http://www.ps.uni-saarland.de/~cebrown/satallax/

Yices, http://yices.csl.sri.com/download.shtml

Regards,
GB

Last updated: Oct 26 2025 at 20:22 UTC