[isabelle] System stuck after try0 calls in separate thread · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] System stuck after try0 calls in separate thread

Email Gateway (Dec 09 2024 at 08:13):

From: Fabian Huch <huch@in.tum.de>
I want to run the mirabelle try0 action in a larger session, e.g. HOL-ex.

However, this doesn't quite work: the tools invoked by try0 can
sometimes run out of control and fill up the stack, so the thread gets
terminated:

Warning - Unable to increase stack - interrupting thread
HOL-ex FAILED (see also "isabelle build_log -H Error HOL-ex")
*** exception Interrupt_Breakdown raised (line 77 of
"./basis/PolyMLException.sml")

So my solution would be to wrap the try0 call in a separate thread, and
count any exceptions within that thread as 'error' (see attached patch).

This does work well in principle, but after a few thousand try0 calls
(with the invocation 'isabelle mirabelle -A try0 HOL-ex'), the Isabelle
process gets stuck: There is no further progress (judged by
printline-debugging), and several CPU cores stay at 100% usage. This
happens with both Isabelle2024 and current devel (and my actual
application, which does not use mirabelle).

Any ideas? From My guess would be that this has to do with thread-local
memory that is not properly deallocated, so the system is stuck in a GC
loop. From further experimentation I can conclude that the thread is
started, but the system never wakes up from the 'OS.Process.sleep' call
that happens in 'Isabelle_Thread.join'. I've let the process run for
several days but it did not make any progress.

Fabian

try0_problem.patch

Email Gateway (Dec 09 2024 at 09:03):

From: Yutaka Nagashima <united.reasoning@gmail.com>
Hi Fabian,

I’ve encountered similar issues and explored approaches along the same
lines.
For some reason, Isabelle_Thread.joinand Isabelle_Thread.fork didn’t behave
as I expected. However, leveraging Future helped me resolve these issues in
my tests so far.

Jonathan’s code snippet appears straightforward and easy to use.
https://github.com/yonoteam/DeepIsaHOL/blob/292aa8ca66db998b98c49aec9a8b630c96fda656/src/main/ml/ops.ML#L202

I’d be interested to hear Makarius’ thoughts on this, as it seems to
address a common challenge faced by those integrating Isabelle’s subtools
into larger automation workflows.

Regards,
Yutaka

On Mon, Dec 9, 2024 at 1:43 PM Fabian Huch <huch@in.tum.de> wrote:

I want to run the mirabelle try0 action in a larger session, e.g. HOL-ex.

However, this doesn't quite work: the tools invoked by try0 can
sometimes run out of control and fill up the stack, so the thread gets
terminated:

Warning - Unable to increase stack - interrupting thread
HOL-ex FAILED (see also "isabelle build_log -H Error HOL-ex")
*** exception Interrupt_Breakdown raised (line 77 of
"./basis/PolyMLException.sml")

So my solution would be to wrap the try0 call in a separate thread, and
count any exceptions within that thread as 'error' (see attached patch).

This does work well in principle, but after a few thousand try0 calls
(with the invocation 'isabelle mirabelle -A try0 HOL-ex'), the Isabelle
process gets stuck: There is no further progress (judged by
printline-debugging), and several CPU cores stay at 100% usage. This
happens with both Isabelle2024 and current devel (and my actual
application, which does not use mirabelle).

Any ideas? From My guess would be that this has to do with thread-local
memory that is not properly deallocated, so the system is stuck in a GC
loop. From further experimentation I can conclude that the thread is
started, but the system never wakes up from the 'OS.Process.sleep' call
that happens in 'Isabelle_Thread.join'. I've let the process run for
several days but it did not make any progress.

Fabian

Email Gateway (Dec 09 2024 at 10:23):

From: Lukas Bartl <L.Bartl@campus.lmu.de>
Hi Fabian,

I can confirm this issue. I also encountered these exceptions and Isabelle also got stuck when I used the mirabelle sledgehammer action. This happened with the default option "try0" which tries different proof methods similar to the try0 command (there were no errors when only the "metis" proof method was used).

Lukas

Am 9. Dezember 2024 09:07:09 MEZ schrieb Fabian Huch <huch@in.tum.de>:

I want to run the mirabelle try0 action in a larger session, e.g. HOL-ex.

However, this doesn't quite work: the tools invoked by try0 can sometimes run out of control and fill up the stack, so the thread gets terminated:

Warning - Unable to increase stack - interrupting thread
HOL-ex FAILED (see also "isabelle build_log -H Error HOL-ex")
*** exception Interrupt_Breakdown raised (line 77 of "./basis/PolyMLException.sml")

So my solution would be to wrap the try0 call in a separate thread, and count any exceptions within that thread as 'error' (see attached patch).

This does work well in principle, but after a few thousand try0 calls (with the invocation 'isabelle mirabelle -A try0 HOL-ex'), the Isabelle process gets stuck: There is no further progress (judged by printline-debugging), and several CPU cores stay at 100% usage. This happens with both Isabelle2024 and current devel (and my actual application, which does not use mirabelle).

Any ideas? From My guess would be that this has to do with thread-local memory that is not properly deallocated, so the system is stuck in a GC loop. From further experimentation I can conclude that the thread is started, but the system never wakes up from the 'OS.Process.sleep' call that happens in 'Isabelle_Thread.join'. I've let the process run for several days but it did not make any progress.

Fabian

Email Gateway (Dec 09 2024 at 11:28):

From: "Eugene W. Stark" <isabelle-users@starkeffect.com>
I don't use Mirabelle, but since a thread has appeared on this type of issue I will add
my observations (which I have posted about in the past) about continuing issues of this nature:

(*) It is possible, during a normal session with Isabelle/JEdit, for the system to go unstable,
consume memory up to any limits I have set (I have generally been using 64GB for Poly/ML),
and then (often) for Poly/ML to be killed by the Linux OOM killer. As best as I can tell,
the actual culprits may be external provers that are invoked via "smt", rather than Poly/ML
itself, which usually ends up being the victim. The situation usually occurs when, due to
changes earlier in the proof document, a subsequent proof "by smt" is rendered non-terminating
and gets triggered by the normal continuous checking feature. It is quite often the case
that I get up from the computer and come back to find that this has occurred. I try to
work around this issue by making sure there are no proof methods with a "purple" color
if I am going to leave the session alone for some time.

(*) It appears that under certain conditions Poly/ML can leak heap. Symptoms are that the heap
size gets closer and closer to the limit over time (again, I typically use 64GB), until
useful work becomes impossible and it is necessary to restart and reload everything
(after which the heap is much smaller).
Sometimes the system will go into an "infinite GC" as the limit is neared, which sounds
very much like what others on this thread are posting about. Note that my workflow involves
frequent invocations of sledgehammer. Lately I have been using the sledgehammer panel
"Apply" button for this, whereas in the past I was in the habit of using "try" and "try0",
but the behavior seems similar.

I speculate that this issue may be related to the frequent abrupt terminations of external
provers -- possibly there are weak references or something to stuff in the Poly/ML heap
that do not get freed sometimes when an external provers terminate abruptly.
Usually during a session I get several pop-ups from the Linux "apparmor" reporting on the
abrupt termination of cvc4, etc. via a signal. These do not always seem to indicate errors
in the external provers, but rather may derive from incorrect signal handling in glue code
that causes the underlying system to report on the occurrence of these signals.

(*) Poly/ML can embark on full GC's that take many, many minutes (this with a 64GB heap limit,
running on a machine with 128GB RAM -- no thrashing is occurring). Sometimes the GC
will not terminate after an hour or more, in which case it is necessary to restart the
session and reload everything. Symptoms while this is occurring usually seems to be
cores pegged at 100% utilization (according to "top"), with no change for a long time.
Usually GC's that are making progress will switch back and forth between 100% on one core
and 100% on multiple cores, as opposed to appearing stagnant.

System details: Intel i9/10850K (10 cores) with 128GB RAM, running Ubuntu 22.04.5 LTS
(GNU/Linux 6.8.0-49-generic x86_64).

- Gene Stark

On 12/9/24 05:19, Lukas Bartl wrote:

Hi Fabian,

I can confirm this issue. I also encountered these exceptions and Isabelle also got stuck when I used the mirabelle sledgehammer action. This happened with the default option "try0" which tries different proof methods similar to the try0 command (there were no errors when only the "metis" proof method was used).

Lukas

Am 9. Dezember 2024 09:07:09 MEZ schrieb Fabian Huch <huch@in.tum.de>:

I want to run the mirabelle try0 action in a larger session, e.g. HOL-ex.

However, this doesn't quite work: the tools invoked by try0 can sometimes run out of control and fill up the stack, so the thread gets terminated:

Warning - Unable to increase stack - interrupting thread
HOL-ex FAILED (see also "isabelle build_log -H Error HOL-ex")
*** exception Interrupt_Breakdown raised (line 77 of "./basis/PolyMLException.sml")

So my solution would be to wrap the try0 call in a separate thread, and count any exceptions within that thread as 'error' (see attached patch).

This does work well in principle, but after a few thousand try0 calls (with the invocation 'isabelle mirabelle -A try0 HOL-ex'), the Isabelle process gets stuck: There is no further progress (judged by printline-debugging), and several CPU cores stay at 100% usage. This happens with both Isabelle2024 and current devel (and my actual application, which does not use mirabelle).

Any ideas? From My guess would be that this has to do with thread-local memory that is not properly deallocated, so the system is stuck in a GC loop. From further experimentation I can conclude that the thread is started, but the system never wakes up from the 'OS.Process.sleep' call that happens in 'Isabelle_Thread.join'. I've let the process run for several days but it did not make any progress.

Fabian

Email Gateway (Dec 11 2024 at 15:17):

From: Fabian Huch <huch@in.tum.de>

On 12/9/24 09:53, Yutaka Nagashima wrote:

I’ve encountered similar issues and explored approaches along the same
lines.
For some reason, Isabelle_Thread.joinand Isabelle_Thread.fork didn’t
behave as I expected. However, leveraging Future helped me resolve
these issues in my tests so far.

Jonathan’s code snippet appears straightforward and easy to use.
https://github.com/yonoteam/DeepIsaHOL/blob/292aa8ca66db998b98c49aec9a8b630c96fda656/src/main/ml/ops.ML#L202

This works better because Future uses a "worker" thread that stays alive
during the whole execution.

Curiously, when I run my real application with Futures, it does finish
all the work (after a few days) -- but the Isabelle process never stops
and there are two CPU cores at full load still (though my application
calls sledgehammer, which does spawn its own threads).

Best,

Fabian

Last updated: Jul 05 2025 at 20:22 UTC