From: Manuel Eberl <manuel@pruvisto.org>
Hello,
the other day I noticed that polyml-x86_64_32-linux in the version used
in Isabelle 2021-1 has an allocation limit of 2^24-1, i.e. you cannot
have an .array whose size is more than ≈17 million. That seems
surprisingly low to me. Is this intentional, e.g. some restriction
inherent to 64_32?
My current workaround is to just nest the arrays to effectively square
that allocation limit, which works fine. But I'm still curious.
Somewhat relatedly, is there anything in the Isabelle/ML library for
dynamically growing arrays or mutable hash maps etc.?
Cheers,
Manuel
From: Makarius <makarius@sketis.net>
On 09/05/2022 16:19, Manuel Eberl wrote:
Hello,
the other day I noticed that polyml-x86_64_32-linux in the version used in
Isabelle 2021-1 has an allocation limit of 2^24-1, i.e. you cannot have an
.array whose size is more than ≈17 million. That seems surprisingly low to me.
Is this intentional, e.g. some restriction inherent to 64_32?
This is a limit of the Poly/ML runtime system. David Matthews can explain the
details.
Abstractly, certain limitations make things more performant, in the space of
possibilities that are important for symbolic applications like Isabelle.
Somewhat relatedly, is there anything in the Isabelle/ML library for
dynamically growing arrays or mutable hash maps etc.?
Why use arrays and mutable hash maps at all?
These old things were already obsolete when I learned ML for the first time
from the first edition of Larry's book (1992).
Makarius
From: Manuel Eberl <manuel@pruvisto.org>
I'm replaying some big SAT proofs in Isabelle/HOL, and mutable arrays
are the natural data structure for that.
When I say "big", I mean that the formula and the proofs are in the
range of hundreds of Megabytes, with millions of clauses and variables.
This actually works surprisingly well. I replayed a proof of a 27 MB
proof on a formula with 6 million clauses in less than 20 seconds.
Manuel
From: David Matthews <dm@prolingua.co.uk>
Every cell in Poly/ML has a word before it that is needed for the
garbage collector. The top byte contains flag bits and the remainder is
the length. That means that cells in native 32-bit systems or 32-in-64
are limited to 2^24-1 words. For native 64-bit systems, though, the
length occupies 56 bits so if you use native 64-bits you could have
arrays of up to 2^56-1 words, but you're likely to run out of memory
long before you reach that.
David
From: Manuel Eberl <manuel@pruvisto.org>
Thanks for the explanation!
@Makarius: I am currently trying to figure out how to properly generate
a text file from Isabelle, basically the same thing that the export_code
command does. I couldn't find any documentation of the ML interface in
the Implementation Manual, but my guess is that I have to use
"Export.export" and possibly "Generated_Files.add_files" (not sure what
the second one does, but the code generator uses it).
I noticed, however, that both of these use just a single string for the
file content, which means that this whole mechanism is limited to a file
size of 16 MB in Poly/ML x86_64_32.
Now, this is not really a problem for me because my files are only a few
MB, but I thought I'd mention this because it might cause some problems
in the future for other people.
Manuel
From: Makarius <makarius@sketis.net>
First note that the limit is 64 MB, not 16 MB.
Since we have seen applications hitting the 64 MB boundary occasionally, I've
spent several days to introduce a scalable type Bytes.T in many places as
above. The present situation is Isabelle/c8263ac985e1 (just for the record).
In Sep-2022 we will start moving towards the next official release, where this
will become standard.
Makarius
Last updated: Jan 04 2025 at 20:18 UTC