Stream: Mirror: Isabelle Users Mailing List

Topic: [isabelle] Merging elimination rules of Claset is slow - ...


view this post on Zulip Email Gateway (Jun 21 2025 at 10:12):

From: "\"Mulder, Ike\"" <cl-isabelle-users@lists.cam.ac.uk>
Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of the Claset of two theories.

The merging happens here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604> – merging safe elimination rules specifically turns out to be slow in our case. There are two reasons for that, I think:

1. The complexity of merging the rules seems higher than it should be
2. We apparently have a lot (about 20000) safe elimination rules. Most of these are automatically added when declaring datatypes
The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic when defining enums with a lot of cases: the number of safe elimination rules added is quadratic in the number of cases in the enum. The addition of these elimination rules happens here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging simpsets still seems to be fast, even in the presence of simp rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe elimination rules for newly defined datatypes. That seems to improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 21 2025 at 13:38):

From: Manuel Eberl <manuel@pruvisto.org>
I recall that there were some big performance problems in one of the AFP
entries on p-adic numbers as well. Switching in and out of locale
contexts sometimes takes a full minute. I wonder if this is related to
some of the issues you mention.

Manuel

On 20/06/2025 14:17, "Mulder, Ike" (via cl-isabelle-users Mailing List)
wrote:

Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.

The second bottleneck we’re seeing is in the merging of elimination
rules of the Claset of two theories.

The merging happens here
<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604>
– merging safe elimination rules specifically turns out to be slow in
our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules.
    Most of these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes
problematic when defining enums with a lot of cases: the number of
safe elimination rules added is quadratic in the number of cases in
the enum. The addition of these elimination rules happens here
<https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging
simpsets still seems to be fast, even in the presence of simp
rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe
elimination rules for newly defined datatypes. That seems to improve
performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 12:20):

From: Lawrence Paulson <lp15@cam.ac.uk>
Plenty of people have large datatype declarations. We need to find a way of handling them that doesn't generate a quadratic amount of anything. As a stopgap, the having the option of just relying on the (quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for large datatypes, since having lots of simprocs floating around causes its own issues.

Larry


From: cl-isabelle-users-request@lists.cam.ac.uk <cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder, Ike" <cl-isabelle-users@lists.cam.ac.uk>
Sent: 20 June 2025 13:17
To: cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Subject: [isabelle] Merging elimination rules of Claset is slow - performance issue loading multiple theories

Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.

The second bottleneck we’re seeing is in the merging of elimination rules of the Claset of two theories.

The merging happens here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604> – merging safe elimination rules specifically turns out to be slow in our case. There are two reasons for that, I think:

1. The complexity of merging the rules seems higher than it should be
2. We apparently have a lot (about 20000) safe elimination rules. Most of these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic when defining enums with a lot of cases: the number of safe elimination rules added is quadratic in the number of cases in the enum. The addition of these elimination rules happens here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging simpsets still seems to be fast, even in the presence of simp rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe elimination rules for newly defined datatypes. That seems to improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 13:27):

From: Jasmin Blanchette <jasmin.blanchette@ifi.lmu.de>
I believe the behavior of mutually recursive datatypes might be worse than quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the LeanHammer on the simplest goals I could imagine: the subgoals emerging from an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] # xs", because the quadratic distinct rules are not generated as theorem in Lean (presumably because of their quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Plenty of people have large datatype declarations. We need to find a way of handling them that doesn't generate a quadratic amount of anything. As a stopgap, the having the option of just relying on the (quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for large datatypes, since having lots of simprocs floating around causes its own issues.

Larry
From: cl-isabelle-users-request@lists.cam.ac.uk <cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder, Ike" <cl-isabelle-users@lists.cam.ac.uk>
Sent: 20 June 2025 13:17
To: cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Subject: [isabelle] Merging elimination rules of Claset is slow - performance issue loading multiple theories

Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of the Claset of two theories.

The merging happens here <https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604> – merging safe elimination rules specifically turns out to be slow in our case. There are two reasons for that, I think:
The complexity of merging the rules seems higher than it should be
We apparently have a lot (about 20000) safe elimination rules. Most of these are automatically added when declaring datatypes
The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic when defining enums with a lot of cases: the number of safe elimination rules added is quadratic in the number of cases in the enum. The addition of these elimination rules happens here <https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging simpsets still seems to be fast, even in the presence of simp rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe elimination rules for newly defined datatypes. That seems to improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 13:44):

From: Fabian Huch <huch@in.tum.de>
What about generating a function that enumerates cases of the datatype,
and having a single distinctiveness theorem for that function? Then only
the proof of that theorem is expensive, and it can be used by
Sledgehammer (but proof search might take a bit longer).

Fabian

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse
than quadratic. (For codatatypes, that's definitely the case.) Perhaps
Dmitriy could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the
LeanHammer on the simplest goals I could imagine: the subgoals
emerging from an induction proving "xs ~= x # xs". LeanHammer failed
to prove "[] ~= [] # xs", because the quadratic distinct rules are not
generated as theorem in Lean (presumably because of their quadratic
nature).

Lesson: Thinking about "simp" is good, but we should also keep
Sledgehammer in mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Plenty of people have large datatype declarations. We need to find a
way of handling them that doesn't generate a quadratic amount of
anything. As a stopgap, the having the option of just relying on the
(quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for
large datatypes, since having lots of simprocs floating around causes
its own issues.

Larry


From:cl-isabelle-users-request@lists.cam.ac.uk
<cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder,
Ike" <cl-isabelle-users@lists.cam.ac.uk>
Sent:20 June 2025 13:17
To:cl-isabelle-users@lists.cam.ac.uk
<cl-isabelle-users@lists.cam.ac.uk>
Subject:[isabelle] Merging elimination rules of Claset is slow -
performance issue loading multiple theories
Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination
rules of the Claset of two theories.

The merging happenshere
<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604>–
merging safe elimination rules specifically turns out to be slow in
our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules.
    Most of these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form
caseI = caseJ ==> R for all i != j. Therefore, this mainly
becomes problematic when defining enums with a lot of cases: the
number of safe elimination rules added is quadratic in the number of
cases in the enum. The addition of these elimination rules
happenshere
<https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging
simpsets still seems to be fast, even in the presence of simp
rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these
safe elimination rules for newly defined datatypes. That seems to
improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 13:57):

From: Jasmin Blanchette <jasmin.blanchette@ifi.lmu.de>
You mean something like

myfunc Nil = 0
myfunc (Cons _ _) = 1

? If you want to investigate this, be my guest. :) Otherwise, I'll keep it as a project idea for a BSc thesis.

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 15:44, Fabian Huch <huch@in.tum.de> wrote:

What about generating a function that enumerates cases of the datatype, and having a single distinctiveness theorem for that function? Then only the proof of that theorem is expensive, and it can be used by Sledgehammer (but proof search might take a bit longer).

Fabian

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse than quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the LeanHammer on the simplest goals I could imagine: the subgoals emerging from an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] # xs", because the quadratic distinct rules are not generated as theorem in Lean (presumably because of their quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk> <mailto:lp15@cam.ac.uk> wrote:

Plenty of people have large datatype declarations. We need to find a way of handling them that doesn't generate a quadratic amount of anything. As a stopgap, the having the option of just relying on the (quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for large datatypes, since having lots of simprocs floating around causes its own issues.

Larry
From: cl-isabelle-users-request@lists.cam.ac.uk <mailto:cl-isabelle-users-request@lists.cam.ac.uk> <cl-isabelle-users-request@lists.cam.ac.uk> <mailto:cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder, Ike" <cl-isabelle-users@lists.cam.ac.uk> <mailto:cl-isabelle-users@lists.cam.ac.uk>
Sent: 20 June 2025 13:17
To: cl-isabelle-users@lists.cam.ac.uk <mailto:cl-isabelle-users@lists.cam.ac.uk> <cl-isabelle-users@lists.cam.ac.uk> <mailto:cl-isabelle-users@lists.cam.ac.uk>
Subject: [isabelle] Merging elimination rules of Claset is slow - performance issue loading multiple theories

Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of the Claset of two theories.

The merging happens here <https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604> – merging safe elimination rules specifically turns out to be slow in our case. There are two reasons for that, I think:
The complexity of merging the rules seems higher than it should be
We apparently have a lot (about 20000) safe elimination rules. Most of these are automatically added when declaring datatypes
The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic when defining enums with a lot of cases: the number of safe elimination rules added is quadratic in the number of cases in the enum. The addition of these elimination rules happens here <https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging simpsets still seems to be fast, even in the presence of simp rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe elimination rules for newly defined datatypes. That seems to improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 14:53):

From: Fabian Huch <huch@in.tum.de>
Exactly. This sounds like a good thesis project.

Fabian

On 6/30/25 15:56, Jasmin Blanchette wrote:

You mean something like

myfunc Nil = 0
myfunc (Cons _ _) = 1

? If you want to investigate this, be my guest. :) Otherwise, I'll
keep it as a project idea for a BSc thesis.

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 15:44, Fabian Huch <huch@in.tum.de> wrote:

What about generating a function that enumerates cases of the
datatype, and having a single distinctiveness theorem for that
function? Then only the proof of that theorem is expensive, and it
can be used by Sledgehammer (but proof search might take a bit longer).

Fabian

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be
worse than quadratic. (For codatatypes, that's definitely the case.)
Perhaps Dmitriy could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out
the LeanHammer on the simplest goals I could imagine: the subgoals
emerging from an induction proving "xs ~= x # xs". LeanHammer failed
to prove "[] ~= [] # xs", because the quadratic distinct rules are
not generated as theorem in Lean (presumably because of their
quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep
Sledgehammer in mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Plenty of people have large datatype declarations. We need to find
a way of handling them that doesn't generate a quadratic amount of
anything. As a stopgap, the having the option of just relying on
the (quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for
large datatypes, since having lots of simprocs floating around
causes its own issues.

Larry


From:cl-isabelle-users-request@lists.cam.ac.uk
<cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder,
Ike" <cl-isabelle-users@lists.cam.ac.uk>
Sent:20 June 2025 13:17
To:cl-isabelle-users@lists.cam.ac.uk
<cl-isabelle-users@lists.cam.ac.uk>
Subject:[isabelle] Merging elimination rules of Claset is slow -
performance issue loading multiple theories
Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination
rules of the Claset of two theories.

The merging happenshere
<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604>–
merging safe elimination rules specifically turns out to be slow in
our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules.
    Most of these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form
caseI = caseJ ==> R for all i != j. Therefore, this mainly
becomes problematic when defining enums with a lot of cases: the
number of safe elimination rules added is quadratic in the number
of cases in the enum. The addition of these elimination rules
happenshere
<https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging
simpsets still seems to be fast, even in the presence of simp
rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these
safe elimination rules for newly defined datatypes. That seems to
improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 15:27):

From: Tobias Nipkow <nipkow@in.tum.de>
I have the vague memory that that's what the HOL4 guys do. An "ord_t" function.
And a threshold for ord_t to kick in and replace the quadratically many rules
(which, as Jasmin pointed out, are much better for s/h).

Tobias

On 30/06/2025 15:44, Fabian Huch wrote:

What about generating a function that enumerates cases of the datatype, and
having a single distinctiveness theorem for that function? Then only the proof
of that theorem is expensive, and it can be used by Sledgehammer (but proof
search might take a bit longer).

Fabian

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse than
quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy
could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the
LeanHammer on the simplest goals I could imagine: the subgoals emerging from
an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] #
xs", because the quadratic distinct rules are not generated as theorem in Lean
(presumably because of their quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in
mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Plenty of people have large datatype declarations. We need to find a way of
handling them that doesn't generate a quadratic amount of anything. As a
stopgap, the having the option of just relying on the (quadratically many)
simprules would be good.

The alternative of generating a simproc is probably good only for large
datatypes, since having lots of simprocs floating around causes its own issues.

Larry


From:cl-isabelle-users-request@lists.cam.ac.uk <cl-isabelle-users-
request@lists.cam.ac.uk> on behalf of "Mulder, Ike" <cl-isabelle-
users@lists.cam.ac.uk>
Sent:20 June 2025 13:17
To:cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Subject:[isabelle] Merging elimination rules of Claset is slow -
performance issue loading multiple theories
Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of
the Claset of two theories.

The merging happenshere <https://isabelle.sketis.net/repos/isabelle/file/tip/
src/Provers/classical.ML#l604>– merging safe elimination rules specifically
turns out to be slow in our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules. Most of
    these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic
when defining enums with a lot of cases: the number of safe elimination rules
added is quadratic in the number of cases in the enum. The addition of these
elimination rules happenshere <https://isabelle.sketis.net/repos/isabelle/
file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging
simpsets still seems to be fast, even in the presence of simp rules that
scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe
elimination rules for newly defined datatypes. That seems to improve
performance a lot.

Best, Ike

smime.p7s

view this post on Zulip Email Gateway (Jun 30 2025 at 15:46):

From: Dmitriy Traytel <cl-isabelle-users@lists.cam.ac.uk>
Hi Jasmin, all,

It is correct that the behavior of the internal type construction for (co)datatypes is worse than quadratic in the numbers of mutual types and type variables. However the number of constructors does not influence the internal construction much (there one works with a single constructor anyway). For example, for the below type the internal construction takes 56 milliseconds whereas producing the rest (including 3540 high-level distinctness theorems) takes 27 seconds on my machine (with parallelism disabled for more precise time measurements).

ML ‹Multithreading.parallel_proofs := 0›
declare [[bnf_timing]]
datatype a =
A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 | A10 |
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 |
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 |
D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 |
E1 | E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 |
F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | F10

But distinctness is only one of the culprits (and as discussed might be useful for proof automation): disabling the plugins brings the time down to 8 seconds:

datatype (plugins only:) a = …

Depending on the use case one can also try selectively disable plugins.

Dmitriy

On 30 Jun 2025, at 15.20, Jasmin Blanchette <jasmin.blanchette@ifi.lmu.de> wrote:

I believe the behavior of mutually recursive datatypes might be worse than quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the LeanHammer on the simplest goals I could imagine: the subgoals emerging from an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] # xs", because the quadratic distinct rules are not generated as theorem in Lean (presumably because of their quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk> wrote:

Plenty of people have large datatype declarations. We need to find a way of handling them that doesn't generate a quadratic amount of anything. As a stopgap, the having the option of just relying on the (quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for large datatypes, since having lots of simprocs floating around causes its own issues.

Larry


From: cl-isabelle-users-request@lists.cam.ac.uk <cl-isabelle-users-request@lists.cam.ac.uk> on behalf of "Mulder, Ike" <cl-isabelle-users@lists.cam.ac.uk>
Sent: 20 June 2025 13:17
To: cl-isabelle-users@lists.cam.ac.uk <cl-isabelle-users@lists.cam.ac.uk>
Subject: [isabelle] Merging elimination rules of Claset is slow - performance issue loading multiple theories

Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of the Claset of two theories.

The merging happens here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/Provers/classical.ML#l604> – merging safe elimination rules specifically turns out to be slow in our case. There are two reasons for that, I think:

1. The complexity of merging the rules seems higher than it should be
2. We apparently have a lot (about 20000) safe elimination rules. Most of these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic when defining enums with a lot of cases: the number of safe elimination rules added is quadratic in the number of cases in the enum. The addition of these elimination rules happens here<https://isabelle.sketis.net/repos/isabelle/file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging simpsets still seems to be fast, even in the presence of simp rules that scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe elimination rules for newly defined datatypes. That seems to improve performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jun 30 2025 at 17:38):

From: Makarius <makarius@sketis.net>
On 21/06/2025 15:35, Manuel Eberl wrote:

I recall that there were some big performance problems in one of the AFP
entries on p-adic numbers as well. Switching in and out of locale contexts
sometimes takes a full minute. I wonder if this is related to some of the
issues you mention.

Performance problems need constructive existance proofs, otherwise they don't
exist.

Can you point to specific locations in AFP?

Makarius

view this post on Zulip Email Gateway (Jun 30 2025 at 17:45):

From: Makarius <makarius@sketis.net>
On 20/06/2025 14:17, "Mulder, Ike" (via cl-isabelle-users Mailing List) wrote:

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.

I would not call this slow ...

Many years ago we've seen theory merges > 30 seconds occasionally, without
getting worried.

The second bottleneck we’re seeing is in the merging of elimination rules of
the Claset of two theories.

The merging happens here <https://isabelle.sketis.net/repos/isabelle/file/tip/
src/Provers/classical.ML#l604> – merging safe elimination rules specifically
turns out to be slow in our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules. Most of
    these are automatically added when declaring datatypes

Whenever there is a performance bottleneck, it needs to be "proven
constructively" by some significant examples. This is important to reproduce
the problem, and cross-check any attempts to improve the situation.

Anything else is just speculation ...

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic when
defining enums with a lot of cases: the number of safe elimination rules added
is quadratic in the number of cases in the enum. The addition of these
elimination rules happens here <https://isabelle.sketis.net/repos/isabelle/
file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging simpsets
still seems to be fast, even in the presence of simp rules that scale
quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe
elimination rules for newly defined datatypes. That seems to improve
performance a lot.

Now the focus has changed to datatypes. This raise a few more questions:

* Is the claset merge operation really slow, or are the involved clasets
just very big?

* Could the datatype package do something differently, with significant
improvements and without too much extra complexity?

* Could your application do something different? Where are the many
datatypes coming from, actually? Can you avoid the expensive merge operations,
e.g. by making the theory graph more linear?

Makarius

view this post on Zulip Email Gateway (Jun 30 2025 at 18:10):

From: Makarius <makarius@sketis.net>
On 30/06/2025 19:36, Makarius wrote:

* Could your application do something different? Where are the many
datatypes coming from, actually?

And more: Are the datatypes genuine datatypes (disjoint sums), or just record
types done with the somewhat bulky datatype_record package?

Makarius

view this post on Zulip Email Gateway (Jun 30 2025 at 18:52):

From: Manuel Eberl <manuel@pruvisto.org>
The entry in question is "Padic_Field", and the theory that is affected
most severely is "Padic_Field_Powers". We talked about this entry quite
a bit when it was first submitted three years ago. Back then the
performance problems were so severe I was wondering whether we could
even accept it.

The entire entry is still pretty slow for what it is. But it's not as
excessive as it once was. I asked the author to try to reduce the
context switches as much as possible and that already helped a lot. But
I also think some of the other performance tuning you did to Isabelle
since then also helped.

As it stands now, there are still two "context" commands in there that
take about 5 s (elapsed) on my machine, namely in line 41 and line
10404. The "lemma" command on line 2837 and the subsequent "blast" call
also take about 2 seconds each, presumably due to the context switch
induced by "(in padic_fields)". As I recall, this used to be almost an
order of magnitude worse.

Of course, 2 or 5 seconds is not dramatic. But in a development that
uses locales quite a bit, these numbers accumulate to an extent that
makes working with such a development interactively (especially during
refactoring) very annoying.

Cheers,

Manuel

On 30/06/2025 19:38, Makarius wrote:

On 21/06/2025 15:35, Manuel Eberl wrote:

I recall that there were some big performance problems in one of the
AFP entries on p-adic numbers as well. Switching in and out of locale
contexts sometimes takes a full minute. I wonder if this is related
to some of the issues you mention.

Performance problems need constructive existance proofs, otherwise
they don't exist.

Can you point to specific locations in AFP?

Makarius

view this post on Zulip Email Gateway (Jun 30 2025 at 23:13):

From: Michael Norrish <cl-isabelle-users@lists.cam.ac.uk>
For what it’s worth, it’s only with enumerated types (consisting of only nullary constructors), where we use a conversion to solve equations between them. The conversion works by mapping the constructors into numbers.

Michael

On 1/7/2025, 01:30, "cl-isabelle-users-request@lists.cam.ac.uk" <cl-isabelle-users-request@lists.cam.ac.uk> wrote:
I have the vague memory that that's what the HOL4 guys do. An "ord_t" function.
And a threshold for ord_t to kick in and replace the quadratically many rules
(which, as Jasmin pointed out, are much better for s/h).

Tobias

On 30/06/2025 15:44, Fabian Huch wrote:

What about generating a function that enumerates cases of the datatype, and
having a single distinctiveness theorem for that function? Then only the proof
of that theorem is expensive, and it can be used by Sledgehammer (but proof
search might take a bit longer).

Fabian

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse than
quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy
could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the
LeanHammer on the simplest goals I could imagine: the subgoals emerging from
an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] #
xs", because the quadratic distinct rules are not generated as theorem in Lean
(presumably because of their quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in
mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk<mailto:lp15@cam.ac.uk>> wrote:

Plenty of people have large datatype declarations. We need to find a way of
handling them that doesn't generate a quadratic amount of anything. As a
stopgap, the having the option of just relying on the (quadratically many)
simprules would be good.

The alternative of generating a simproc is probably good only for large
datatypes, since having lots of simprocs floating around causes its own issues.

Larry


From:cl-isabelle-users-request@lists.cam.ac.uk<mailto:cl-isabelle-users-request@lists.cam.ac.uk> <cl-isabelle-users-
request@lists.cam.ac.uk<mailto:request@lists.cam.ac.uk>> on behalf of "Mulder, Ike" <cl-isabelle-
users@lists.cam.ac.uk<mailto:users@lists.cam.ac.uk>>
Sent:20 June 2025 13:17
To:cl-isabelle-users@lists.cam.ac.uk<mailto:cl-isabelle-users@lists.cam.ac.uk> <cl-isabelle-users@lists.cam.ac.uk<mailto:cl-isabelle-users@lists.cam.ac.uk>>
Subject:[isabelle] Merging elimination rules of Claset is slow -
performance issue loading multiple theories
Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of
the Claset of two theories.

The merging happenshere <https://isabelle.sketis.net/repos/isabelle/file/tip/
src/Provers/classical.ML#l604>– merging safe elimination rules specifically
turns out to be slow in our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules. Most of
    these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic
when defining enums with a lot of cases: the number of safe elimination rules
added is quadratic in the number of cases in the enum. The addition of these
elimination rules happenshere <https://isabelle.sketis.net/repos/isabelle/
file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging
simpsets still seems to be fast, even in the presence of simp rules that
scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe
elimination rules for newly defined datatypes. That seems to improve
performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jul 01 2025 at 03:34):

From: "\"Becker, Hanno\"" <cl-isabelle-users@lists.cam.ac.uk>
Can this be put into the user’s hand? Allow them to replace the autogeneration of (in)equality related lemmas by a simproc (perhaps through the existing plugin mechanism). This helps applications dealing with large datatypes (esp. enumerated types), while not enforcing change where the current approach works fine.

Hanno

From: <cl-isabelle-users-request@lists.cam.ac.uk> on behalf of Michael Norrish <cl-isabelle-users@lists.cam.ac.uk>
Reply-To: "Michael.Norrish@anu.edu.au" <Michael.Norrish@anu.edu.au>
Date: Tuesday, 1 July 2025 at 00:14
To: Tobias Nipkow <nipkow@in.tum.de>, "cl-isabelle-users@lists.cam.ac.uk" <cl-isabelle-users@lists.cam.ac.uk>
Subject: RE: [EXTERNAL] [isabelle] Merging elimination rules of Claset is slow - performance issue loading multiple theories

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

For what it’s worth, it’s only with enumerated types (consisting of only nullary constructors), where we use a conversion to solve equations between them. The conversion works by mapping the constructors into numbers.

Michael

On 1/7/2025, 01:30, "cl-isabelle-users-request@lists.cam.ac.uk" <cl-isabelle-users-request@lists.cam.ac.uk> wrote:
I have the vague memory that that's what the HOL4 guys do. An "ord_t" function.
And a threshold for ord_t to kick in and replace the quadratically many rules
(which, as Jasmin pointed out, are much better for s/h).

Tobias

On 30/06/2025 15:44, Fabian Huch wrote:

What about generating a function that enumerates cases of the datatype, and
having a single distinctiveness theorem for that function? Then only the proof
of that theorem is expensive, and it can be used by Sledgehammer (but proof
search might take a bit longer).

Fabian

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse than
quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy
could comment on that.

I cannot resist inserting an anecdote here. Last week, I tried out the
LeanHammer on the simplest goals I could imagine: the subgoals emerging from
an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] #
xs", because the quadratic distinct rules are not generated as theorem in Lean
(presumably because of their quadratic nature).

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in
mind. :)

Jasmin

--
Prof. Dr. Jasmin Blanchette
Chair of Theoretical Computer Science and Theorem Proving
Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 München, Germany
Tel.: +49 (0)89 2180 9341
Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk<mailto:lp15@cam.ac.uk>> wrote:

Plenty of people have large datatype declarations. We need to find a way of
handling them that doesn't generate a quadratic amount of anything. As a
stopgap, the having the option of just relying on the (quadratically many)
simprules would be good.

The alternative of generating a simproc is probably good only for large
datatypes, since having lots of simprocs floating around causes its own issues.

Larry


From:cl-isabelle-users-request@lists.cam.ac.uk<mailto:cl-isabelle-users-request@lists.cam.ac.uk> <cl-isabelle-users-
request@lists.cam.ac.uk<mailto:request@lists.cam.ac.uk>> on behalf of "Mulder, Ike" <cl-isabelle-
users@lists.cam.ac.uk<mailto:users@lists.cam.ac.uk>>
Sent:20 June 2025 13:17
To:cl-isabelle-users@lists.cam.ac.uk<mailto:cl-isabelle-users@lists.cam.ac.uk> <cl-isabelle-users@lists.cam.ac.uk<mailto:cl-isabelle-users@lists.cam.ac.uk>>
Subject:[isabelle] Merging elimination rules of Claset is slow -
performance issue loading multiple theories
Hello,

We’ve been seeing performance problems where just the theory … imports … begin header is taking anywhere from 5-20 seconds.
The second bottleneck we’re seeing is in the merging of elimination rules of
the Claset of two theories.

The merging happenshere <https://isabelle.sketis.net/repos/isabelle/file/tip/
src/Provers/classical.ML#l604>– merging safe elimination rules specifically
turns out to be slow in our case. There are two reasons for that, I think:

  1. The complexity of merging the rules seems higher than it should be
  2. We apparently have a lot (about 20000) safe elimination rules. Most of
    these are automatically added when declaring datatypes

The safe elimination rules added by new datatypes have the form caseI = caseJ ==> R for all i != j. Therefore, this mainly becomes problematic
when defining enums with a lot of cases: the number of safe elimination rules
added is quadratic in the number of cases in the enum. The addition of these
elimination rules happenshere <https://isabelle.sketis.net/repos/isabelle/
file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

As for 1), I think the complexity could be improved because merging
simpsets still seems to be fast, even in the presence of simp rules that
scale quadratically in the number of cases in an Enum.

For now, I have written a small scrappy plugin that deletes these safe
elimination rules for newly defined datatypes. That seems to improve
performance a lot.

Best, Ike

view this post on Zulip Email Gateway (Jul 01 2025 at 06:36):

From: Tobias Nipkow <nipkow@in.tum.de>
Hi Michael,

Thanks for the correction. You are using a conversion. One could also just
simplify with the thm

ord_t x = ord_t y ==> x = y

The conversion is for efficiency? In particular because it only fires if x and y
are constructors?

Tobias

On 01/07/2025 01:12, Michael Norrish wrote:

For what it’s worth, it’s only with enumerated types (consisting of only nullary
constructors), where we use a conversion to solve equations between them. The
conversion works by mapping the constructors into numbers.

Michael

On 1/7/2025, 01:30, "cl-isabelle-users-request@lists.cam.ac.uk" <cl-isabelle-
users-request@lists.cam.ac.uk> wrote:

I have the vague memory that that's what the HOL4 guys do. An "ord_t" function.

And a threshold for ord_t to kick in and replace the quadratically many rules

(which, as Jasmin pointed out, are much better for s/h).

Tobias

On 30/06/2025 15:44, Fabian Huch wrote:

What about generating a function that enumerates cases of the datatype, and

having a single distinctiveness theorem for that function? Then only the proof

of that theorem is expensive, and it can be used by Sledgehammer (but proof

search might take a bit longer).

>

>

Fabian

>

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse than

quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy

could comment on that.

>

I cannot resist inserting an anecdote here. Last week, I tried out the

LeanHammer on the simplest goals I could imagine: the subgoals emerging from

an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] #

xs", because the quadratic distinct rules are not generated as theorem in Lean

(presumably because of their quadratic nature).

>

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in

mind. :)

>

Jasmin

>

--

Prof. Dr. Jasmin Blanchette

Chair of Theoretical Computer Science and Theorem Proving

Ludwig-Maximilians-Universität München

Oettingenstr. 67, 80538 München, Germany

Tel.: +49 (0)89 2180 9341

Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html
<https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html>

>

>

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk
<mailto:lp15@cam.ac.uk>> wrote:

>

Plenty of people have large datatype declarations. We need to find a way of

handling them that doesn't generate a quadratic amount of anything. As a

stopgap, the having the option of just relying on the (quadratically many)

simprules would be good.

>

The alternative of generating a simproc is probably good only for large

datatypes, since having lots of simprocs floating around causes its own issues.

>

Larry

>


From:cl-isabelle-users-request@lists.cam.ac.uk <mailto:cl-isabelle-users-
request@lists.cam.ac.uk> <cl-isabelle-users-

request@lists.cam.ac.uk <mailto:request@lists.cam.ac.uk>> on behalf of
"Mulder, Ike" <cl-isabelle-

users@lists.cam.ac.uk <mailto:users@lists.cam.ac.uk>>

Sent:20 June 2025 13:17

To:cl-isabelle-users@lists.cam.ac.uk <mailto:cl-isabelle-
users@lists.cam.ac.uk> <cl-isabelle-users@lists.cam.ac.uk <mailto:cl-isabelle-
users@lists.cam.ac.uk>>

Subject:[isabelle] Merging elimination rules of Claset is slow -

performance issue loading multiple theories

Hello,

>

We’ve been seeing performance problems where just the `theory … imports …

begin` header is taking anywhere from 5-20 seconds.

The second bottleneck we’re seeing is in the merging of elimination rules of

the Claset of two theories.

>

The merging happenshere <https://isabelle.sketis.net/repos/isabelle/file/
tip/ <https://isabelle.sketis.net/repos/isabelle/file/tip/>

src/Provers/classical.ML#l604>– merging safe elimination rules specifically

turns out to be slow in our case. There are two reasons for that, I think:

>

1. The complexity of merging the rules seems higher than it should be

2. We apparently have a lot (about 20000) safe elimination rules. Most of

these are automatically added when declaring datatypes

>

The safe elimination rules added by new datatypes have the form `caseI =

caseJ ==> R for all i != j`. Therefore, this mainly becomes problematic

when defining enums with a lot of cases: the number of safe elimination rules

added is quadratic in the number of cases in the enum. The addition of these

elimination rules happenshere <https://isabelle.sketis.net/repos/isabelle/
<https://isabelle.sketis.net/repos/isabelle/>

file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

>

As for 1), I think the complexity could be improved because merging

simpsets still seems to be fast, even in the presence of simp rules that

scale quadratically in the number of cases in an Enum.

>

For now, I have written a small scrappy plugin that deletes these safe

elimination rules for newly defined datatypes. That seems to improve

performance a lot.

>

Best, Ike

>

smime.p7s

view this post on Zulip Email Gateway (Jul 01 2025 at 07:59):

From: Michael Norrish <cl-isabelle-users@lists.cam.ac.uk>
To be honest, I’m not sure of the historic motivation (it was me who wrote the code, but in 2002, and the commit message just says that I preferred to this to the unilateral use of your theorem as an iff, which is what had been being done).

Of course, we could make the conversion fire on just the quadratically many constructor combinations (making the term-net bigger and grosser), but in fact we match against any equality on the type and then check to see that both sides are constants. Only then do we apply the analogue of your ord_t. Looking at the code I see we only do this for cases when #constructors > 15, a number that I’m sure was pulled out of nowhere. For smaller types, we just generate all the rewrites.

Best wishes,
Michael

On 1/7/2025, 16:36, "Tobias Nipkow" <nipkow@in.tum.de> wrote:
Hi Michael,

Thanks for the correction. You are using a conversion. One could also just
simplify with the thm

ord_t x = ord_t y ==> x = y

The conversion is for efficiency? In particular because it only fires if x and y
are constructors?

Tobias

On 01/07/2025 01:12, Michael Norrish wrote:

For what it’s worth, it’s only with enumerated types (consisting of only nullary
constructors), where we use a conversion to solve equations between them. The
conversion works by mapping the constructors into numbers.

Michael

On 1/7/2025, 01:30, "cl-isabelle-users-request@lists.cam.ac.uk<mailto:cl-isabelle-users-request@lists.cam.ac.uk>" <cl-isabelle-
users-request@lists.cam.ac.uk<mailto:users-request@lists.cam.ac.uk>> wrote:

I have the vague memory that that's what the HOL4 guys do. An "ord_t" function.

And a threshold for ord_t to kick in and replace the quadratically many rules

(which, as Jasmin pointed out, are much better for s/h).

Tobias

On 30/06/2025 15:44, Fabian Huch wrote:

What about generating a function that enumerates cases of the datatype, and

having a single distinctiveness theorem for that function? Then only the proof

of that theorem is expensive, and it can be used by Sledgehammer (but proof

search might take a bit longer).

>

>

Fabian

>

On 6/30/25 15:20, Jasmin Blanchette wrote:

I believe the behavior of mutually recursive datatypes might be worse than

quadratic. (For codatatypes, that's definitely the case.) Perhaps Dmitriy

could comment on that.

>

I cannot resist inserting an anecdote here. Last week, I tried out the

LeanHammer on the simplest goals I could imagine: the subgoals emerging from

an induction proving "xs ~= x # xs". LeanHammer failed to prove "[] ~= [] #

xs", because the quadratic distinct rules are not generated as theorem in Lean

(presumably because of their quadratic nature).

>

Lesson: Thinking about "simp" is good, but we should also keep Sledgehammer in

mind. :)

>

Jasmin

>

--

Prof. Dr. Jasmin Blanchette

Chair of Theoretical Computer Science and Theorem Proving

Ludwig-Maximilians-Universität München

Oettingenstr. 67, 80538 München, Germany

Tel.: +49 (0)89 2180 9341

Web: https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html
<https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html><https://www.tcs.ifi.lmu.de/mitarbeiter/jasmin-blanchette_de.html%3e>

>

>

On 30. Jun 2025, at 14:20, Lawrence Paulson <lp15@cam.ac.uk<mailto:lp15@cam.ac.uk>
<mailto:lp15@cam.ac.uk<mailto:lp15@cam.ac.uk>>> wrote:

>

Plenty of people have large datatype declarations. We need to find a way of

handling them that doesn't generate a quadratic amount of anything. As a

stopgap, the having the option of just relying on the (quadratically many)

simprules would be good.

>

The alternative of generating a simproc is probably good only for large

datatypes, since having lots of simprocs floating around causes its own issues.

>

Larry

>


From:cl-isabelle-users-request@lists.cam.ac.uk<mailto:cl-isabelle-users-request@lists.cam.ac.uk> <mailto:cl-isabelle-users-
request@lists.cam.ac.uk<mailto:request@lists.cam.ac.uk>> <cl-isabelle-users-

request@lists.cam.ac.uk<mailto:request@lists.cam.ac.uk> <mailto:request@lists.cam.ac.uk<mailto:request@lists.cam.ac.uk>>> on behalf of
"Mulder, Ike" <cl-isabelle-

users@lists.cam.ac.uk<mailto:users@lists.cam.ac.uk> <mailto:users@lists.cam.ac.uk<mailto:users@lists.cam.ac.uk>>>

Sent:20 June 2025 13:17

To:cl-isabelle-users@lists.cam.ac.uk<mailto:cl-isabelle-users@lists.cam.ac.uk> <mailto:cl-isabelle-
users@lists.cam.ac.uk<mailto:users@lists.cam.ac.uk>> <cl-isabelle-users@lists.cam.ac.uk<mailto:cl-isabelle-users@lists.cam.ac.uk> <mailto:cl-isabelle-
users@lists.cam.ac.uk<mailto:users@lists.cam.ac.uk>>>

Subject:[isabelle] Merging elimination rules of Claset is slow -

performance issue loading multiple theories

Hello,

>

We’ve been seeing performance problems where just the `theory … imports …

begin` header is taking anywhere from 5-20 seconds.

The second bottleneck we’re seeing is in the merging of elimination rules of

the Claset of two theories.

>

The merging happenshere <https://isabelle.sketis.net/repos/isabelle/file/
tip/ <https://isabelle.sketis.net/repos/isabelle/file/tip/><https://isabelle.sketis.net/repos/isabelle/file/tip/%3e>

src/Provers/classical.ML#l604>– merging safe elimination rules specifically

turns out to be slow in our case. There are two reasons for that, I think:

>

  1. The complexity of merging the rules seems higher than it should be
  1. We apparently have a lot (about 20000) safe elimination rules. Most of

these are automatically added when declaring datatypes

>

The safe elimination rules added by new datatypes have the form `caseI =

caseJ ==> R for all i != j`. Therefore, this mainly becomes problematic

when defining enums with a lot of cases: the number of safe elimination rules

added is quadratic in the number of cases in the enum. The addition of these

elimination rules happenshere <https://isabelle.sketis.net/repos/isabelle/
<https://isabelle.sketis.net/repos/isabelle/><https://isabelle.sketis.net/repos/isabelle/%3e>

file/tip/src/HOL/Tools/Ctr_Sugar/ctr_sugar.ML#l1083>.

>

As for 1), I think the complexity could be improved because merging

simpsets still seems to be fast, even in the presence of simp rules that

scale quadratically in the number of cases in an Enum.

>

For now, I have written a small scrappy plugin that deletes these safe

elimination rules for newly defined datatypes. That seems to improve

performance a lot.

>

Best, Ike

>

view this post on Zulip Email Gateway (Jul 01 2025 at 16:23):

From: Florian Haftmann <florian.haftmann@cit.tum.de>
(This thread covers a lot of topics which are not obviously covered by
the subject. I am replying to the message which opened this particular
thread, in the hope that it won’t add more confusion.)

Plenty of people have large datatype declarations. We need to find a way
of handling them that doesn't generate a quadratic amount of anything.
As a stopgap, the having the option of just relying on the
(quadratically many) simprules would be good.

The alternative of generating a simproc is probably good only for large
datatypes, since having lots of simprocs floating around causes its own
issues.

Concerning the quadratic blowup of distinctness rules: there was once a
mechanism to generate those rules for code generation lazily, and as
early as 2009, we decided to drop it (very likely after a discussion
between Stefan Berghofer an me):

hg log -r cda9a931a46b -p

This seems to indicate that generating those rules is not a serious
issue, although different tools might still have their pitfalls to apply
those rules efficiently.

Cheers,
Florian

OpenPGP_0xA707172232CFA4E9.asc
OpenPGP_signature.asc


Last updated: Jul 02 2025 at 04:31 UTC