[isabelle] Code generation & type classes - inefficient code · Archive Mirror: Isabelle Users Mailing List

Stream: Archive Mirror: Isabelle Users Mailing List

Topic: [isabelle] Code generation & type classes - inefficient code

Email Gateway (Aug 22 2022 at 19:54):

From: Dominique Unruh <unruh@ut.ee>
Hello,

Generation of ML code from Isabelle mimics type classes by passing
around explicit records containing the functions specific to the type class.

E.g., you might get a function "inner_product A_ a b = map (#times A_)
(zip a b)" (roughly).

This works well, however, I noticed that the resulting code can be very
inefficient because the same records are constructed over and over. For
example, the term "[enumhd,enumhd,enumhd,enumhd,enumhd,enumhd] ::
(boolboolboolboolbool) list" (where enumhd is some constant we
defined using typeclass enum, the details are immaterial here) generates
the code

val code = [ enumhd
     (Enum.enum_prod Enum.enum_bool
       (Enum.enum_prod Enum.enum_bool
         (Enum.enum_prod Enum.enum_bool
           (Enum.enum_prod Enum.enum_bool Enum.enum_bool)))),

... repeated six times ... ]

(Demonstrated in the attached Scratch.thy)

The effect of this is two-fold:

* The generated code becomes much longer and thus compiles potentially
very slowly (I had a term in my application that was ~2500 lines of
code, and became ~150 lines after removing the repeated type class
generation code. The 2500 lines took ~40-60 seconds to compile which
means that a simple "by eval" takes that long.)

* A runtime overhead is incurred. E.g., the Enum.enum_prod function
above is internally producing products of lists over an over again.
This could be prohibitive if the Enum.enum_prod (or similar) occurs
in a function that is used in a loop.

I would have expected code like this:

val code = let val enum1 = (Enum.enum_prod Enum.enum_bool
       (Enum.enum_prod Enum.enum_bool
         (Enum.enum_prod Enum.enum_bool
           (Enum.enum_prod Enum.enum_bool Enum.enum_bool))))

val enumhd1 = enumhd enum1

in [enumhd1,enumhd1,enumhd1,enumhd1,enumhd1,enumhd1] end

This would compute every class only once (and if enumhd creates
typeclasses, this would also be done only once during the computation of
enumhd1).

(The general case would be: "f::args=>result" becomes "fun f
<typeclasses> = let <compute type classes and instantiate constants> in
fn args => <code> end". This way, invoking "val f1 = f <typeclasses>"
already computes all recursively type classes, and then "f1 args" can be
used several times without recomputing the type classes.)

My questions:

* Is there a way to tell the code generator to do things more
efficiently? (E.g., the way I described.)

* Perhaps future versions of the code generator could do this? (I
don't know if the code generator is still being developed.)

Best wishes,
Dominique.
Scratch.thy

Email Gateway (Aug 22 2022 at 19:55):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>
Hi Dominique,

Am 21.05.19 um 12:08 schrieb Dominique Unruh:

Generation of ML code from Isabelle mimics type classes by passing
around explicit records containing the functions specific to the type
class.

This works well, however, I noticed that the resulting code can be very
inefficient because the same records are constructed over and over.

[...]

My questions:

* Is there a way to tell the code generator to do things more
efficiently? (E.g., the way I described.)

one approach is to provide explicit local definitions in code equations,
e. g.

lemma [code]:
‹f x = let e = (enum :: …) in …›

Another possibility is to provide monomorphic abbreviations, e. g.

definition min_int :: ‹int => int => int›
[code_abbrev]: "min_int = min"

lemma [code]:
"min_int k l = …"

Or a combination of both. It depends on the context what is actually
applicable.

* Perhaps future versions of the code generator could do this? (I
don't know if the code generator is still being developed.)

Ocassionally there are still things happening. But not with a big
agenda in mind.

Hope this helps,
Florian
signature.asc

Email Gateway (Aug 22 2022 at 19:57):

From: Dominique Unruh <unruh@ut.ee>
Hi Florian,

thank you for your suggestions. Unfortunately, after a lot of
experiments I conclude that both approaches do not work in my case:

lemma [code]:
‹f x = let e = (enum :: …) in …›

I have written a conversation that does this automatically and added it
to the codegeneration preprocessor. While that works (meaning, the code
resulting code, by manual inspection, has less duplication), it does not
help if the subgoal that is to be solved by "by eval" is large and thus
contains a lot of recomputation of type classes. (In my case, the
subgoal is the result from running the simplifier and there is no way to
manually make a better subgoal.) I tried applying the aforementioned
conversion directly to the subgoal (via "apply (tactic ...)") but
applied to that eval just fails without error message (which is
surprising since the goal before conversion could be solved by eval,
albeit slowly). (I also tried "by normalization" which, to my surprise,
gives an exception related to the allocation of a string above the
maximum supported size.)

definition min_int :: ‹int => int => int›
[code_abbrev]: "min_int = min"

lemma [code]:
"min_int k l = …"
This approach does not work because there are a lot of different types
that occur. (I am working with typed matrices, so types would be, e.g.,
"('a,'b) bounded" (standing for bounded operator) with 'a,'b ranging
over a number of different types like bit, bitbit, bit(bit*bit),
(bitbit)bit etc. even within a single calculation (tensor products are
involved).

Overall, my conclusion is that I see nothing that can be done on the
user level. To the best of my understanding, a change in the serializer
would be the best solution (because an inspection of the thingol values
that are produced do not compute the type class instances, this happens
after serialization). I assume that these inefficiencies will crop up
whenever whenever the enum type class is used, although possibly not to
as extreme a degree as in my case. (It may be that most use cases will
therefore not suffer from this.)

Best wishes,
Dominique.

On 5/21/19 10:36 PM, Florian Haftmann wrote:

Hi Dominique,

Am 21.05.19 um 12:08 schrieb Dominique Unruh:

Generation of ML code from Isabelle mimics type classes by passing
around explicit records containing the functions specific to the type
class.

This works well, however, I noticed that the resulting code can be very
inefficient because the same records are constructed over and over.
[...]

My questions:

* Is there a way to tell the code generator to do things more
efficiently? (E.g., the way I described.)
one approach is to provide explicit local definitions in code equations,
e. g.

lemma [code]:
‹f x = let e = (enum :: …) in …›

Another possibility is to provide monomorphic abbreviations, e. g.

definition min_int :: ‹int => int => int›
[code_abbrev]: "min_int = min"

lemma [code]:
"min_int k l = …"

Or a combination of both. It depends on the context what is actually
applicable.

* Perhaps future versions of the code generator could do this? (I
don't know if the code generator is still being developed.)
Ocassionally there are still things happening. But not with a big
agenda in mind.

Hope this helps,
Florian

Email Gateway (Aug 22 2022 at 19:59):

From: Florian Haftmann <florian.haftmann@informatik.tu-muenchen.de>
Hi Dominique,

I was not aware that you are using the eval method here.

Maybe something can be done by avoiding enum somehow altogether; is your
specification somewhere available for inspection?

Cheers,
Florian
signature.asc

Email Gateway (Aug 22 2022 at 19:59):

From: Dominique Unruh <unruh@ut.ee>
Hi Florian,

sorry, I wasn't clear about my use of eval...

Yes, the code is available on GitHub. But it is part of a rather complex
whole.

The repository is https://github.com/dominique-unruh/qrhl-tool, the most
recent commit (6978604) shows the problem.

The file with the problematic eval is examples/Teleport_Terse.thy
(besides others).

The types on which I operate (('a,'b) Bounded and 'a linear_space) are
defined in Bounded_Operators.thy and Complex_Inner_Product.thy,
respectively (in the submodule isabelle-thys/Bounded_Operators).

Those types represent bounded operators from Hilbert space 'a to Hilbert
space 'b, and closed subspaces of 'a, respectively. Furthermore, a
concrete Hilbert space is 'a ell2 (for any type 'a), the type of square
summable functions 'a=>complex.

This is the general setting. In the specific case where 'a,'b are finite
(more specifically, enum), ('a ell2,'b ell2) Bounded can be represented
by matrices (and 'a ell2 subspace as a span of vectors). For this
specific case, I developed the theory QRHL_Code.thy to automate
computations in this finite case (using the matrix code from
AFP/Jordan_Normal_Form under the hood). Since the types Bounded and
linear_space are not just for the finite case but a general theory of
Hilbert spaces (and similar), I cannot replace them by something simpler
just for the sake of code generation (e.g., the goal in
Teleport_Terse.thy is a verification condition that arises from the
semantics of a quantum language).

The line "apply (simp add: prepare_for_code)" rewrites the verification
condition (which is not directly computable because it refers to
operators operating on variables in an infinite-dimensional quantum
memory) into an expression involving only operators on finite
dimensional Hilbert spaces. Unfortunately, this rewriting leads to a
very large term because a lot of auxiliary operators are introduced to
"move" the applied operations to the right quantum variables (operators
for swapping registers and for "doing associativity").

The resulting eval then struggles with a very large term. But most of
the runtime for compiling and running the code actually comes from the
duplication. (This I know because I tried generating the code for that
subgoal using export_code, and then edited the code manually to avoid
duplications, and the result was much shorter and faster.)

I have tried the following solutions:

* Writing simplification rules that rewrite the monster goal into one
that directly uses only the operations from AFP/Jordan_Normal_Form
(i.e., direct matrix operations). That is, I don't have any code
equations for Bounded and linear_space any more. I committed this in
branch experiment-code-rewrite. Unfortunately, this leads to a term
that contains lots of enum-constructions again, and running eval on
that term takes ages (or doesn't terminate). Also, even if that
approach would work, it is somewhat unsatisfactory because it means
that, if the enduser of my theories writes a new definition of a
function involving operators (e.g., definition square x = x * x),
then they have to also add a simplification rule for expanding it.
The automatic "[code]" that is part of the definition command won't
work.

* I wrote a conversion that removed duplications from terms (using
let) and applied it to the monster subgoal. I committed this in
branch experiment-code-hacks. This has two problems: First, the code
does not get much better because I can only avoid duplication it the
same constant is reused, but not if different constants use the same
type class. Second, it doesn't work: "by eval" fails without giving
an error message (even though before applying the conversion, by
eval succeeded). And "by normalization" (which is usually a goto
point for debugging failing "by eval"s) fails with "exception Size
raised (line 171 of "./basis/LibrarySupport.sml")" which if I am not
mistaken, means that it tries to allocate a very large string.

Best wishes,
Dominique.

Email Gateway (Aug 22 2022 at 20:00):

From: Dominique Unruh <unruh@ut.ee>
A few things I forgot to mention:

* There is a lot of Scala code etc in that repository, but for the
problem at hand, only the Isabelle theories are relevant (in
isabelle-thys, isabelle-thys/Bounded_Operators, examples).

* The root of the repository contains a ROOTS file that can be used to
open all those theories in Isabelle/jEdit.

* The theories assume Isabelle2019, and AFP should be configured.
(Tested with RC4)

Best wishes,
Dominique.

Last updated: Jul 03 2025 at 01:50 UTC