[isabelle] Merging theory definitions calls normalise too... · Mirror: Isabelle Users Mailing List

Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Merging theory definitions calls normalise too...

Email Gateway (Jun 21 2025 at 10:12):

From: "\"Mulder, Ike\"" <cl-isabelle-users@lists.cam.ac.uk>
Hello,

We’ve been seeing performance problems where just the theory _ imports _ begin header is taking anywhere from 5-20 seconds.
One of the two bottlenecks we’ve seen is the merging of definitions of two theories.

Merging definitions happens in this function<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Pure/defs.ML#l254>, which adds the definitions in one theory to that of the other if they are not already present.
Whenever a new definition is found, the dependencies function is called, which in turn calls normalize.
The problem is that normalize is expensive, folding over all definitions to find reducts that must be reduced further (I think).
This makes it at least O(#definitions). normalize is called for _every_ newly inserted definition, so the complexity of merge becomes quadratic in the number of definitions.

I have experimented with removing the call to normalize from dependencies, then normalizeing after all definitions have been added. Note that one must also add an explicit call to normalize at the other callsite of dependencies here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Pure/defs.ML#l279>. That seems to work and speed things up significantly.

Best, Ike

Email Gateway (Jun 30 2025 at 12:32):

From: Lawrence Paulson <lp15@cam.ac.uk>
The following paper is possibly relevant to the issue, although its authors are not the authors of the code:

https://eprints.whiterose.ac.uk/id/eprint/191505/1/Consistent_Foundation_IsabelleHOL_JAR_2019.pdf

A consistent foundation for Isabelle/HOL<https://eprints.whiterose.ac.uk/id/eprint/191505/1/Consistent_Foundation_IsabelleHOL_JAR_2019.pdf>
a [•] [•] [•] a [•]
eprints.whiterose.ac.uk
Handling definitions correctly is critical to soundness, so we need to proceed carefully here.

From: cl-isabelle-users-request@lists.cam.ac.uk <cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder, Ike" <cl-isabelle-users@lists.cam.ac.uk>
Sent: 20 June 2025 13:15
To: cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Subject: [isabelle] Merging theory definitions calls normalise too often - performance issue loading multiple theories

Hello,

We’ve been seeing performance problems where just the theory _ imports _ begin header is taking anywhere from 5-20 seconds.

One of the two bottlenecks we’ve seen is the merging of definitions of two theories.

Whenever a new definition is found, the dependencies function is called, which in turn calls normalize.

The problem is that normalize is expensive, folding over all definitions to find reducts that must be reduced further (I think).
This makes it at least O(#definitions). normalize is called for _every_ newly inserted definition, so the complexity of merge becomes quadratic in the number of definitions.

Best, Ike

Email Gateway (Jun 30 2025 at 17:54):

From: Makarius <makarius@sketis.net>
On 20/06/2025 14:15, "Mulder, Ike" (via cl-isabelle-users Mailing List) wrote:

Hello,

We’ve been seeing performance problems where just the theory _ imports _ begin header is taking anywhere from 5-20 seconds.

One of the two bottlenecks we’ve seen is the merging of definitions of two
theories.

Merging definitions happens in this function <https://isabelle.sketis.net/
repos/isabelle/file/tip/src/Pure/defs.ML#l254>, which adds the definitions in
one theory to that of the other if they are not already present.

Whenever a new definition is found, the dependencies function is called,
which in turn calls normalize.

The problem is that normalize is expensive, folding over all definitions to
find reducts that must be reduced further (I think).
This makes it at least O(#definitions). normalize is called for _/every/_
newly inserted definition, so the complexity of merge becomes quadratic in the
number of definitions.

I have experimented with removing the call to normalize from dependencies,
then normalizeing after all definitions have been added. Note that one must
also add an explicit call to normalize at the other callsite of dependencies
here <https://isabelle.sketis.net/repos/isabelle/file/tip/src/Pure/
defs.ML#l279>. That seems to work and speed things up significantly.

Sorry, "it seems to work" is not an acceptable quality standard for changes to
any serious piece of software, and especially not for the Isabelle/Pure kernel.

The module defs.ML has reached a certain local optimum many years ago. It can
certainly be improved further, but that does not proceed by adhoc changes.

In particular, the implementation needs to be reconciled with with work by
Kuncar/Popescu on overloaded definitions (written in LaTeX) and corresponding
formalizations by Gengelbach/Åman Pohjola (written in HOL4). That appears to
be great work, but it does not exactly correspond to the actual defs.ML.

Re-opening that problem is important, but it won't happen now: It could easily
require a few more years.

In the meantime, you can ask yourself if something can be done differently in
the application. Excessive numbers of defs normalization usually happen when
many constants are merely declared, but not properly specified (defined).

And again, performance problems should be accompanied by proper examples, e.g.
ones seen in the Isabelle distribution or AFP.

Makarius

Email Gateway (Jun 30 2025 at 18:12):

From: Makarius <makarius@sketis.net>
On 30/06/2025 14:32, Lawrence Paulson wrote:

The following paper is possibly relevant to the issue, although its authors
are not the authors of the code:

https://eprints.whiterose.ac.uk/id/eprint/191505/1/
Consistent_Foundation_IsabelleHOL_JAR_2019.pdf <https://
eprints.whiterose.ac.uk/id/eprint/191505/1/
Consistent_Foundation_IsabelleHOL_JAR_2019.pdf>

A consistent foundation for Isabelle/HOL <https://eprints.whiterose.ac.uk/id/
eprint/191505/1/Consistent_Foundation_IsabelleHOL_JAR_2019.pdf>
a [•] [•] [•] a [•]
eprints.whiterose.ac.uk

Handling definitions correctly is critical to soundness, so we need to proceed
carefully here.

Yes, and we need to take into account what we have learned about that in the
past 11 years (since Vienna Summer of Logic 2014).

* The work by Kuncar/Popescu is fine, but informal (LaTeX only). It also
deviates from the previous foundations of Isabelle/Pure (and HOL) in many
respects. The worst deviation is the very checker itself --- maybe that is
just an accident.

* The work by Gengelbach/Åman Pohjola
https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITP.2022.15 is
great: it formalizes Kuncar/Popescu under the impression that this is what
Isabelle/Pure really does.

So the next move is to study everything, together with the actual sources of
defs.ML --- which is ML text intended to be read by humans, and only
occasionally to be run by a machine. That might only require some weeks or
months, but the start of that interval is unclear.

Makarius

Last updated: Aug 20 2025 at 20:23 UTC