duplicate key value violates unique constraint "isabelle_... · Mirror: Isabelle Development Mailing List

Stream: Mirror: Isabelle Development Mailing List

Topic: duplicate key value violates unique constraint "isabelle_...

Email Gateway (Dec 30 2025 at 19:49):

From: Florian Haftmann <florian.haftmann@cit.tum.de>

There is a reproducible break down in the build system:

https://build.proof.cit.tum.de/build?id=038ecd8c-6a3e-4e55-bacd-a455cf7f6b40

A few weeks ago there was a similar break down.

What is going on here?

Thanks a lot,
Florian

OpenPGP_0xA707172232CFA4E9.asc
OpenPGP_signature.asc

Email Gateway (Jan 01 2026 at 21:47):

From: Makarius <makarius@sketis.net>

On 30/12/2025 20:49, Florian Haftmann wrote:

There is a reproducible break down in the build system:

https://build.proof.cit.tum.de/build?id=038ecd8c-6a3e-4e55-bacd-a455cf7f6b40

Thanks for the hint. I have now reset the database state of the build server.

A few weeks ago there was a similar break down.

What is going on here?
There is some fragility in the design and/or implementation of the cluster.
Rather soon, I need to sit down with Fabian Huch, to sort it out.

Makarius

Email Gateway (Jan 02 2026 at 08:16):

From: Tobias Nipkow <nipkow@in.tum.de>

Unfortunately the build system has hit the same problem again.

Tobias

On 01/01/2026 22:46, Makarius wrote:

On 30/12/2025 20:49, Florian Haftmann wrote:

There is a reproducible break down in the build system:

https://build.proof.cit.tum.de/build?id=038ecd8c-6a3e-4e55-bacd-a455cf7f6b40

Thanks for the hint. I have now reset the database state of the build server.

A few weeks ago there was a similar break down.

What is going on here?
There is some fragility in the design and/or implementation of the cluster.
Rather soon, I need to sit down with Fabian Huch, to sort it out.

Makarius

smime.p7s

Email Gateway (Jan 02 2026 at 21:08):

From: Makarius <makarius@sketis.net>

On 02/01/2026 09:15, Tobias Nipkow wrote:

Unfortunately the build system has hit the same problem again.

I have reset everything once more, and tried "isabelle build_task -a"
successfully.

Hopefully this is sufficient for the rest of the Christmas vacation. I will be
mostly unavailable at least until 07-Jan-2026.

Makarius

Email Gateway (Jan 06 2026 at 11:47):

From: Fabian Huch <huch@in.tum.de>

There are two problems here:

the underlying problem is that the Isabelle/Scala process uses up a
lot more memory than it used to, causing OOM errors.
the build infrastructure cannot handle build processes that have
abnormally terminated.

To address this, we can:

(1) tune memory settings and investigate into why we're using up so much
more JVM memory than before -- I'll look into that first.

(2) Make the system more robust in the face of such problems. I need to
sit down with Makarius to discuss this.

(3) Make the scheduling aware of memory limitations. The system was
designed such that this is possible, but it requires more work. I'll
work on that once the immediate problems are solved.

Fabian

On 1/2/26 22:08, Makarius wrote:

On 02/01/2026 09:15, Tobias Nipkow wrote:

Unfortunately the build system has hit the same problem again.

I have reset everything once more, and tried "isabelle build_task -a"
successfully.

Hopefully this is sufficient for the rest of the Christmas vacation. I
will be mostly unavailable at least until 07-Jan-2026.

Makarius

Email Gateway (Jan 13 2026 at 10:27):

From: Fabian Huch <huch@in.tum.de>

On 1/6/26 12:46, Fabian Huch wrote:

There are two problems here:

the underlying problem is that the Isabelle/Scala process uses up a
lot more memory than it used to, causing OOM errors.

the build infrastructure cannot handle build processes that have
abnormally terminated.

To address this, we can:

(1) tune memory settings and investigate into why we're using up so
much more JVM memory than before -- I'll look into that first.

I checked and tuned the system:

we had a rogue Isabelle process on one of the nodes that refused to
terminate, eating up ~30GiB of that machine's RAM
I reduced the maximal number of concurrent jobs on our lower-memory
machines (which have 64GiB each)

This worked fine for the past week; however the memory consumption of
the JVM process is still abnormally high.

Looking at memory snapshots of the build, one of the reasons is the
recent incorporation of Node_Status into the progress: command_timings
are stored as

command_timings:Map[Command, Command_Timings]

and the command (in the key) contains large markups, e.g. in
init_markups, during the lifetime of these timings: up to ~1GiB during
the build of the distribution, likely significantly more in an AFP
build. I would propose to only keep the Document_ID.Command here, see
the attached patch.

Fabian

node_status_memory.patch

Email Gateway (Jan 13 2026 at 12:55):

From: Makarius <makarius@sketis.net>

On 13/01/2026 11:26, Fabian Huch wrote:

Looking at memory snapshots of the build, one of the reasons is the recent
incorporation of Node_Status into the progress: command_timings are stored as

command_timings:Map[Command, Command_Timings]

and the command (in the key) contains large markups, e.g. in init_markups,
during the lifetime of these timings: up to ~1GiB during the build of the
distribution, likely significantly more in an AFP build.
Thanks for the measurements. I've already suspected a problem exactly there,
but did not know how to do memory profiling on the JVM.

Can you give some hints on how to do that?

Makarius

Email Gateway (Jan 13 2026 at 14:45):

From: Fabian Huch <huch@in.tum.de>

On 1/13/26 13:54, Makarius wrote:

On 13/01/2026 11:26, Fabian Huch wrote:

Looking at memory snapshots of the build, one of the reasons is the
recent incorporation of Node_Status into the progress:
command_timings are stored as

command_timings:Map[Command, Command_Timings]

and the command (in the key) contains large markups, e.g. in
init_markups, during the lifetime of these timings: up to ~1GiB
during the build of the distribution, likely significantly more in an
AFP build.
Thanks for the measurements. I've already suspected a problem exactly
there, but did not know how to do memory profiling on the JVM.

Can you give some hints on how to do that?

I usually do this by analyzing multiple heap dumps of the JVM during the
run. Those can be generated either from within the JVM, like so:

def dump_heap(file: Path, include_live: Boolean =true): Unit = {
val mxBean = java.lang.management.ManagementFactory.getPlatformMXBean(
classOf[com.sun.management.HotSpotDiagnosticMXBean])
mxBean.dumpHeap(file.absolute.implode, include_live)
}

or by instructing the JVM from the outside, e.g. via 'jmap
-dump:format=b,file=heap.hprof <pid>'

These heap dumps can then be analyzed with tools such as Eclipse MAT or
JProfiler, in particular by:

looking at histograms of object classes and their shallow heap
size/retained heap size, e.g. here you see that command markups retain
~1GiB of heap:
analyzing the memory graph through tree views, e.g. here you can see
what why the command markups are retained (there is a delayed event to
update the progress, where the largest Command key has ~16MiB init_markups):

However, I couldn't get MAT to ignore the weak references of our XML
cache (displayed as object with largest retained heap); this makes it
far less useful here.

Perhaps this is possible by tinkering with the options, but I don't know
how -- I used to work with JProfiler, but that is commercial.

Fabian

xVNQjnDCttesdZ9N.png
IWTyC0NJNa2j0viE.png

Email Gateway (Jan 14 2026 at 20:23):

From: Makarius <makarius@sketis.net>

On 13/01/2026 11:26, Fabian Huch wrote:

Looking at memory snapshots of the build, one of the reasons is the recent
incorporation of Node_Status into the progress: command_timings are stored as

command_timings:Map[Command, Command_Timings]

and the command (in the key) contains large markups, e.g. in init_markups,
during the lifetime of these timings: up to ~1GiB during the build of the
distribution, likely significantly more in an AFP build. I would propose to
only keep the Document_ID.Command here, see the attached patch.

See now:

changeset: 83825:3ed2781b1cc5
user: wenzelm
date: Wed Jan 14 13:09:47 2026 +0100
files: src/Pure/PIDE/document_status.scala
src/Tools/jEdit/src/timing_dockable.scala
description:
more frugal persistent data, notably for batch builds --- proposed by Fabian
Huch, after Java heap measurements (see also 3f6280aedbcc, a110e7e24e55,
9cc5d77d505c);

It is your original change, with my explanation from the history. I've added a
few more changes on top, but they don't make a fundamental difference.

Note that I did not repeat any performance measurements. I want to do this
eventually, also to see if other problems are present.

Makarius

Email Gateway (Jan 17 2026 at 22:35):

From: Makarius <makarius@sketis.net>

On 14/01/2026 21:22, Makarius wrote:

See now:

changeset:   83825:3ed2781b1cc5
user:        wenzelm
date:        Wed Jan 14 13:09:47 2026 +0100
files:       src/Pure/PIDE/document_status.scala src/Tools/jEdit/src/
timing_dockable.scala
description:
more frugal persistent data, notably for batch builds --- proposed by Fabian
Huch, after Java heap measurements (see also 3f6280aedbcc, a110e7e24e55,
9cc5d77d505c);

It is your original change, with my explanation from the history. I've added a
few more changes on top, but they don't make a fundamental difference.

Note that I did not repeat any performance measurements. I want to do this
eventually, also to see if other problems are present.

I have now done some measurements with "isabelle build -o threads=8 -j2 -a -f"
on my Linux box (16 cores), using the included patch for the Isabelle/Scala
implementation of "isabelle build".

The Scala/Java process uses these settings:

ISABELLE_TOOL_JAVA_OPTIONS="-Djava.awt.headless=true -Xms512m -Xmx4g
-Xss16m -XX:+UseZGC -XX:+ZGenerational -XX:SoftMaxHeapSize=2g"

Attached is raw data from the parent of 3ed2781b1cc5 (1), and from
3ed2781b1cc5 (2). The diagram.png is from gnuplot:

plot '1.dat' using 1:2 smooth sbezier title "1 heap_size" noenhanced,
'1.dat' using 1:3 smooth sbezier title "1 heap_used" noenhanced, '2.dat' using
1:2 smooth sbezier title "2 heap_size" noenhanced, '2.dat' using 1:3 smooth
sbezier title "2 heap_used" noenhanced

Conclusion: There is nothing interesting to be seen here. The change hardly
makes a difference.

It could mean that occurrences of Document_ID.Command and Command are in
1-to-1 correspondence on the heap, i.e. no stored Command is later updated/copied.

Or it could mean that I've got something wrong in the experimental setup.

Makarius

ch
1.dat.xz
2.dat.xz
diagram.png

Email Gateway (Jan 18 2026 at 14:13):

From: Makarius <makarius@sketis.net>

On 14/01/2026 21:22, Makarius wrote:

See now:

changeset:   83825:3ed2781b1cc5
user:        wenzelm
date:        Wed Jan 14 13:09:47 2026 +0100
files:       src/Pure/PIDE/document_status.scala src/Tools/jEdit/src/
timing_dockable.scala
description:
more frugal persistent data, notably for batch builds --- proposed by Fabian
Huch, after Java heap measurements (see also 3f6280aedbcc, a110e7e24e55,
9cc5d77d505c);

It is your original change, with my explanation from the history. I've added a
few more changes on top, but they don't make a fundamental difference.

Note that I did not repeat any performance measurements. I want to do this
eventually, also to see if other problems are present.

I have now done some measurements with "isabelle build -o threads=8 -j2 -a -f"
on my Linux box (16 cores), using the included patch for the Isabelle/Scala
implementation of "isabelle build".

The Scala/Java process uses these settings:

ISABELLE_TOOL_JAVA_OPTIONS="-Djava.awt.headless=true -Xms512m -Xmx4g
-Xss16m -XX:+UseZGC -XX:+ZGenerational -XX:SoftMaxHeapSize=2g"

Attached is raw data 1.dat from the parent of 3ed2781b1cc5, and 2.dat from
3ed2781b1cc5. The diagram.png is from gnuplot:

Conclusion: There is nothing interesting to be seen here. The change hardly
makes a difference.

It could mean that occurrences of Document_ID.Command and Command are in
1-to-1 correspondence on the heap, i.e. no stored Command is later updated/copied.

Or it could mean that I've got something wrong in the experimental setup.

Makarius

ch
1.dat
2.dat
diagram.png

Email Gateway (Jan 19 2026 at 11:28):

From: Fabian Huch <huch@in.tum.de>

On 1/17/26 23:34, Makarius wrote:

On 14/01/2026 21:22, Makarius wrote:

See now:

changeset:   83825:3ed2781b1cc5
user:        wenzelm
date:        Wed Jan 14 13:09:47 2026 +0100
files:       src/Pure/PIDE/document_status.scala src/Tools/jEdit/src/
timing_dockable.scala
description:
more frugal persistent data, notably for batch builds --- proposed by
Fabian Huch, after Java heap measurements (see also 3f6280aedbcc,
a110e7e24e55, 9cc5d77d505c);

It is your original change, with my explanation from the history.
I've added a few more changes on top, but they don't make a
fundamental difference.

Note that I did not repeat any performance measurements. I want to do
this eventually, also to see if other problems are present.

I have now done some measurements with "isabelle build -o threads=8
-j2 -a -f" on my Linux box (16 cores), using the included patch for
the Isabelle/Scala implementation of "isabelle build".

The Scala/Java process uses these settings:

ISABELLE_TOOL_JAVA_OPTIONS="-Djava.awt.headless=true -Xms512m
-Xmx4g -Xss16m -XX:+UseZGC -XX:+ZGenerational -XX:SoftMaxHeapSize=2g"

Attached is raw data from the parent of 3ed2781b1cc5 (1), and from
3ed2781b1cc5 (2). The diagram.png is from gnuplot:

plot '1.dat' using 1:2 smooth sbezier title "1 heap_size"
noenhanced, '1.dat' using 1:3 smooth sbezier title "1 heap_used"
noenhanced, '2.dat' using 1:2 smooth sbezier title "2 heap_size"
noenhanced, '2.dat' using 1:3 smooth sbezier title "2 heap_used"
noenhanced

Conclusion: There is nothing interesting to be seen here. The change
hardly makes a difference.

It could mean that occurrences of Document_ID.Command and Command are
in 1-to-1 correspondence on the heap, i.e. no stored Command is later
updated/copied.

Hm. I had measured a small number of samples (since heap snapshots are
expensive and large) before and after the change and looked at the heap
composition. But it is possible that the (consistently) lower heap size
was coincidental, and the extended lifetime of these Command markups did
not actually matter -- I'll have to figure out how to properly deal with
our opaque cache in MAT.

Or it could mean that I've got something wrong in the experimental setup.

One problem with this experimental setup is that current heap size is
not a good metric for such a high-throughput application (most memory
used would actually be dead).

Also, the smoothing hides what is going on, e.g. when the heap limit is
hit (cf. plot for 1 below). I would rather use the MXBeans of the
garbage collector to obtain GC events and plot a proper peak-to-peak curve.

Fabian

eY93XwRiR134kXXF.png

Last updated: Feb 25 2026 at 08:53 UTC