From: Michael Norrish <Michael.Norrish@nicta.com.au>
Wow. That’s a fascinating number. Note of course that the obvious way to store table sizes would not affect sharing of substructure.
Michael
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
From: Makarius <makarius@sketis.net>
On Thu, 7 May 2015, Michael Norrish wrote:
On 7 May 2015, at 19:23, Makarius <makarius@sketis.net> wrote:
On Thu, 7 May 2015, Michael Norrish wrote:
125000 live tables in memory. Tables can get big (motivating the
desire to avoid O(n) operations), but I’d be surprised if a running
Isabelle really had that many tables around.My guess is that we normally have millions of live tables around, and
it is important that their substructure is sharable as much as
possible.Wow. That’s a fascinating number.
It only gives a hint about the order of data in the main context
containers. Symtab.table and variants are almost as pervasively used as
plain List.list; e.g. a graph uses that for its adjacency relation.
This all works well and nicely, because the 2-3 tree data structure is a
bit better than the more commonly known read-black tree. In the Scala
library they have the latter, and it causes so many problems that are
absent in our fine ML world.
Note of course that the obvious way to store table sizes would not
affect sharing of substructure.
It is still a waste of resources: heap space and (equally important)
source space in the implementation. The size is not needed for the data
structure, so why should it be stored.
Makarius
From: Michael Norrish <Michael.Norrish@nicta.com.au>
Is there any chance the signature TABLE (from src/Pure/General/table.ML) could be extended to support a constant time
size : 'a table -> int
operation? At the moment, it seems as if I have to calculate this “by hand” (perhaps with fold), and with O(n) complexity.
Michael
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
From: Makarius <makarius@sketis.net>
This is indeed the canonical way to do it:
fun size tab = Symtab.fold (fn _ => fn n => n + 1) tab 0
The canonical fold combinator for this (and other data structures)
subsumes many applications, without having to saturate signatures
unnecessarily. The above size definition could be added (probly under the
name "lentgh"), but in practice it is hardly ever needed, and canonical
fold expressions work well.
Of course that operation is linear in the size, just like List.length. The
reasons for not storing local sizes in the data structure for constant
size operations are the same as for lists: extra redundancy that needs to
be maintained at run-time and requires more space in memory.
Our Table implementation is almost as fundamental as List. It is at the
bottom of large-scale data management. Any change down there will impact
almost everything, and many people will have to get 32 GB machines at
last.
If you need an augmented version of Table, you can look at these examples
how it is done:
~~/src/Pure/General/change_table.ML
~~/src/Pure/General/graph.ML
Makarius
From: Michael Norrish <Michael.Norrish@nicta.com.au>
I appreciate the desire not to meddle with something fundamental, but I don’t think the concern about wasted space is justified. Assuming an extra int is going to take 8 bytes to store, in order to waste even 1MB (which is in turn just 1/16000th of a 16GB machine), you’d have to have 125000 live tables in memory. Tables can get big (motivating the desire to avoid O(n) operations), but I’d be surprised if a running Isabelle really had that many tables around.
Thanks for the pointers to the extended table examples.
Michael
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
From: Makarius <makarius@sketis.net>
My guess is that we normally have millions of live tables around, and it
is important that their substructure is sharable as much as possible.
2-3 years ago when I made some measurements about the number of persistent
theory values, it was about 0.5 million for big applications. The number
of theory data slots is at the order of 100. This may sound insane, but we
have quite successfully violated old schemes to poke around destructively
in mutable global storage.
I need to revisit this once again soon, to see if we can extend the 32bit
age a bit further. Many users today have small mobile devices instead of
proper workstations. So the question will again be what can be taken
away, rather than added at the very bottom.
Makarius
Last updated: Nov 21 2024 at 12:39 UTC