[isabelle] Creating ML data is slow. No sharing/caching? · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Creating ML data is slow. No sharing/caching?

Email Gateway (Aug 22 2022 at 10:42):

From: "C. Diekmann" <diekmann@in.tum.de>
Dear ML and Isabelle experts,

I observed that my code written in Isabelle got horribly slow after a
small change. Currently, I evaluate my code with

value[code]

Here is a small Isabelle/HOL example that illustrates the issue:

datatype 'a expr = Prim 'a | And "'a expr" "'a expr"

definition add_expr :: "'a expr ⇒ 'a expr list ⇒ 'a expr list"
where "add_expr e es = map (λe'. And e e') es"

1) Constructing a list of expressions is quite fast:
value[code] "map Prim [1..1000]" (*0.339s elapsed time, 0.678s cpu
time, 0.000s GC time*)

2) If I add a small expression to a large list of expressions, it is also fast:
value[code] "add_expr (Prim 42) (map Prim [1..1000])" (*0.604s elapsed
time, 1.201s cpu time, 0.026s GC time*)

3) Constructing one larger expression is also fast:
value[code] "fold And (map Prim [1..10]) (Prim 0)" (*too fast to give
a timing?*)

4) Adding bigger expressions to a list of expressions is more than 10
times slower than (2):
value[code] "add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim
[1..1000])" (5.853s elapsed time, 11.495s cpu time, 0.316s GC time)

5) Almost the same result if the expression is written down explicitly:
value[code] "add_expr (And (Prim 10) (And (Prim 9) (And (Prim 8) (And
(Prim 7) (And (Prim 6) (And (Prim 5) (And (Prim 4) (And (Prim 3) (And
(Prim 2) (And (Prim 1) (Prim 0))))))))))) (map Prim [1..1000])"
(5.715s elapsed time, 11.148s cpu time, 0.316s GC time)

I played around with this, repeated these small measurements and
shuffled the value statements to make sure that we are not observing
some CPU turbo boost artifact here.

By the way, the performance doesn't seem to get worse If the list of
expressions consists of large expressions:
value[code] "add_expr (fold And (map Prim [1..10]) (Prim 0)) (map (λi.
(fold And (map Prim [1..10]) (Prim i))) [1..1000])" (*5.291s elapsed
time, 6.138s cpu time, 0.366s GC time*)

Why is (4) so slow, compared to (2)?

How can I speed this up? Currently, this renders my theory unusable.

Doesn't ML use sharing or caching? I always assumed that, if I have
expressions e1 and e2, then constructing "And e1 e2" would reuse the
values e1 and e2 and this operation would be independent of the sizes
of e1 and e2.

Best,
Cornelius

Email Gateway (Aug 22 2022 at 10:42):

From: Christian Sternagel <c.sternagel@gmail.com>
Dear Cornelius,

isn't the problem merely that the expression

"fold And (map Prim [1..10]) (Prim 0)"

is evaluated 1000 times? I guess you could speed-up things by using
something like:

"let e = fold And (map Prim [1..10]) (Prim 0) in add_expr e (map Prim
[1..1000]"

I did not yet test my intuition though.

cheers

chris

Email Gateway (Aug 22 2022 at 10:42):

From: Christian Sternagel <c.sternagel@gmail.com>
Sorry for the noise, I'm wrong. -cheers chris

Email Gateway (Aug 22 2022 at 10:43):

From: Andreas Lochbihler <andreas.lochbihler@inf.ethz.ch>
Hi Cornelius,

Your problem is not the evaluation, it's the pretty printing. Evaluations 4 and 5 result
in quite large terms, which have a fairly deep nesting of functions. The Isabelle/ML
process sends to Isabelle/jEdit the whole syntax tree, which includes type annotations for
every subterm and layout information (suitable positions for line breaks). All this
information has to be parsed and type-set in the output buffer.

If you really want to measure the evaluation time, you should either make sure that your
evaluations result in small terms or use the ML interface which avoids most of the
pretty-printing overhead. For example,

value [code]
"let x = add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim [1..1000]) in ()"

is pretty fast (note that HOL's let gets compiled to ML's let, so the let binding is
always evaluated) and so is

definition "four = add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim [1..1000])"
ML ‹ @{code four} ›

I do not know the details of the implementation in Isabelle/Scala, so I cannot tell why 2
is much faster than 4 in terms of pretty printing. My guess is that 2 consists of a long
list of values, i.e., you have a fairly flat term structure and special notation for lists
(which suppresses a lot of the auxiliary information passed along with the constants).

Hope this helps,
Andreas

Email Gateway (Aug 22 2022 at 10:43):

From: "C. Diekmann" <diekmann@in.tum.de>

Your problem is not the evaluation, it's the pretty printing.

Thanks Andreas! This explains everything.

Cheers,
Cornelius

Email Gateway (Aug 22 2022 at 10:44):

From: "C. Diekmann" <diekmann@in.tum.de>
Dear list,

pretty printing can cause quite some overhead. I just wanted to

(2174.839s elapsed time, 3308.410s cpu time, 161.539s GC time)
value[code] "unfolded"

(0.561s)
value[code] "let x = unfolded in ()"

(39.257s)
value[code] "map (my_toString) unfolded"

Email Gateway (Aug 22 2022 at 10:44):

From: "C. Diekmann" <diekmann@in.tum.de>
... I just wanted to share those numbers in the previous mail. I never
thought that writing a custom toString function can make such a huge
difference. 30min vs. 30sec. The tested data is a ~500 rules iptables
firewall [1]

Cornelius

(and probably I should unify my mail clients key binding to avoid
those half-finished mails. Sorry for the noise.)

[1] https://github.com/diekmann/Iptables_Semantics paper at FM'15

Email Gateway (Aug 22 2022 at 10:44):

From: Gerwin Klein <Gerwin.Klein@nicta.com.au>
We’ve observed similar things in the past with printing large goal states and Simpl terms, esp when lots of abbreviations are active.

It has gotten below the pain threshold for us, I think mainly after Makarius reorganised the whole syntax stack a few years ago, so we haven’t done anything about it in a long time. (Maybe we just got used to it too much).

If someone feels motivated to look at what exactly is taking so long in printing and if there is anything that could be done about it, we’d be happy to help and dig out examples.

Cheers,
Gerwin

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

Email Gateway (Aug 22 2022 at 11:22):

From: Makarius <makarius@sketis.net>
That is occasionally called "hash consing", but it would introduce a
central (synchronized) place for construction data structures. Not very
efficient in the multicore era ...

Even in ancient single-core times, this old idea did not became very
popular, probably because most data is rather short-lived and it is better
not to make such a fuss about its allocation.

The Poly/ML runtime-system has other means to organize long-time survivers
on the heap efficiently, by re-introducing sharing on demand. You won't
see that in such micro-benchmarks, though. In AFP/JinjaThreads you do see
the CPU crunching a lot, to fit the resulting heap into the tiny 4 GB
address space: the Monitor/Heap dockable in Isabelle/jEdit gives an
impression of what is going on in the ML RTS.

Makarius

Email Gateway (Aug 22 2022 at 11:22):

From: Makarius <makarius@sketis.net>
I've done some ML timing here
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/value.ML#l58
and here
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/value.ML#l61
and here
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/value.ML#l64

The timing for value_maybe_select is about the same as the pretty_term +
writeln together. Looking at the profile of value_maybe_select, there are
a lot of Simplifier operations, which probably belong to the preparation
stage of the code-generator, including preparation of the result before
actual printing.

So my guess is that "let x = add_expr ..." is so much faster, because its
result is so small, independtly of printing.

In the above timing experiments, Isabelle/Scala is not measured at all.
The front-end is asynchronous. It might burn a lot of CPU cycles on its
own for displaying big output, but that is not directly seen here. I don't
think this particularly example is big enough to be a big problem. Other
examples can bomb the front-end as usual.

Makarius

Email Gateway (Aug 22 2022 at 11:23):

From: Andreas Lochbihler <andreas.lochbihler@inf.ethz.ch>
Hi Makarius,

The code generator does quite some post-processing of the term output using the simplifier
(by unfolding the equations in code_post). You might be right in that this kind of
post-processing is the main bottleneck here. Technically, this is not part of the pretty
printer, but I would still consider this as a part of pretty-printing after evaluation.

Preprocessing of the code equations can take considerable time in some cases, but AFAIK
Florian implemented a cache for code equations. Thus, a second invocation of "value
[code]" for the same term should not suffer from preprocessing time.

Best,
Andreas

Email Gateway (Aug 22 2022 at 11:25):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>

Preprocessing of the code equations can take considerable time in some
cases, but AFAIK Florian implemented a cache for code equations. Thus, a
second invocation of "value [code]" for the same term should not suffer
from preprocessing time.

Note that the conversions involving code generation are built like
sandwiches on top of each other, each adding a further level of
preparation and postpolishing to the other. For illustration, Try to
downtrack the dependencies of e.g. Code_Runtime.dynamic_value. You
might consider plugging in some time measurements there to get an idea
what is really going on.

Florian

Best,
Andreas

On 03/09/15 16:11, Makarius wrote:

On Wed, 29 Jul 2015, Andreas Lochbihler wrote:

If you really want to measure the evaluation time, you should either
make sure that your
evaluations result in small terms or use the ML interface which
avoids most of the
pretty-printing overhead. For example,

value [code]
"let x = add_expr (fold And (map Prim [1..10]) (Prim 0)) (map Prim
[1..1000]) in ()"

is pretty fast (note that HOL's let gets compiled to ML's let, so the
let binding is
always evaluated) and so is

definition "four = add_expr (fold And (map Prim [1..10]) (Prim 0))
(map
Prim [1..1000])"
ML ‹ @{code four} ›

I do not know the details of the implementation in Isabelle/Scala, so
I cannot tell why
2 is much faster than 4 in terms of pretty printing. My guess is that
2 consists of a
long list of values, i.e., you have a fairly flat term structure and
special notation
for lists (which suppresses a lot of the auxiliary information passed
along with the
constants).

I've done some ML timing here
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/value.ML#l58

and here
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/value.ML#l61

and here
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2015/src/HOL/Tools/value.ML#l64

The timing for value_maybe_select is about the same as the pretty_term
+ writeln
together. Looking at the profile of value_maybe_select, there are a
lot of Simplifier
operations, which probably belong to the preparation stage of the
code-generator,
including preparation of the result before actual printing.

So my guess is that "let x = add_expr ..." is so much faster, because
its result is so
small, independtly of printing.

In the above timing experiments, Isabelle/Scala is not measured at
all. The front-end is
asynchronous. It might burn a lot of CPU cycles on its own for
displaying big output, but
that is not directly seen here. I don't think this particularly
example is big enough to
be a big problem. Other examples can bomb the front-end as usual.

Makarius

signature.asc

Last updated: Jun 30 2025 at 20:23 UTC