Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] AFP (session Native_Word): test_code failed


view this post on Zulip Email Gateway (Aug 22 2022 at 16:30):

From: Christian Sternagel <c.sternagel@gmail.com>
Dear all,

I am in the strange situation that on a workstation Native_Word builds,
while on my laptop (with, as far as I can tell, almost identical setup)
there is a problem with code generation (see the attached log file).

On both machines I use Isabelle2017 and afp-2017 ( 5af4ceb1cc28 ). On
the workstation I have

$ isabelle build -v Native_Word
Started at Tue Dec 12 13:13:57 GMT+1 2017 (polyml-5.6_x86_64-linux on
cllabws1)
ISABELLE_BUILD_OPTIONS="threads=12"

ML_PLATFORM="x86_64-linux"
ML_HOME="/home/griff/Downloads/Isabelle2017/contrib/polyml-5.6-1/x86_64-linux"
ML_SYSTEM="polyml-5.6"
ML_OPTIONS="-H 1500 --gcthreads 6"
...

and on my laptop

$ isabelle build -v Native_Word
Started at Tue Dec 12 13:13:37 GMT+1 2017 (polyml-5.6_x86_64-linux on
cst470s)
ISABELLE_BUILD_OPTIONS="threads=4"

ML_PLATFORM="x86_64-linux"
ML_HOME="/usr/local/Isabelle2017/contrib/polyml-5.6-1/x86_64-linux"
ML_SYSTEM="polyml-5.6"
ML_OPTIONS="-H 1500 --gcthreads 2"
...

Any ideas what could have gone wrong? (In the meantime I am just copying
the heap images generated on the workstation to my laptop, which works
fine.)

cheers

chris
Native_Word

view this post on Zulip Email Gateway (Aug 22 2022 at 16:30):

From: Manuel Eberl <eberlm@in.tum.de>
I think I know what this is.

I had a very similar problem a while ago. I worked for a day or two with
Makarius and David Matthews to figure out what it is and it is related
to some problem with thread synchronisation in Poly/ML.

The problem appears to be resolved in Poly/ML 5.7.1.

To be more precise, Poly/ML has something called a "crowbar thread" that
is launched at the very end of the process (before all the resources are
cleaned up) and terminates the process after 20 seconds if the cleanup
has not completed. However, at some point during the cleanup, one of the
thread synchronisation variables that was still in use was being
deleted, which is undefined behaviour in pthreads.

Apparently, this causes no problems on most systems, but there are rare
cases where you get spurious deadlocks, which cause the crowbar thread
to kick in at some point and ML returns with exit code 1.

Independently from these problems, David Matthews changed the way the
thread synchronisation is done in 5.7.1 and now the problem does not
occur anymore; at least for me.

Cheers,

Manuel


Last updated: Nov 21 2024 at 12:39 UTC