From: Martin Raszyk <m.raszyk@gmail.com>
Dear Isabelle fellows,
I would like to report an inefficiency in the implementation of sets using
RBTs. When computing the union of two sets represented using RBTs, the
approach of Appel [1] is used. Section 6 of this paper presents three ways
of computing the union and one of them is chosen dynamically by
approximating the number of elements in the sets by the black-node height
of the respective RBTs.
However, such an approximation can be quite imprecise. As an example, we
can consider computing the union of n disjoint sets of l integers each,
resulting in a set of n * l integers. At the i-th step, we compute the
union of a set with l elements and a set with i * l elements. Let us fix n
= l = 200. Then at the 163-th step, the two RBTs with 200 and 32600
elements, respectively, are still approximated to be of the same size which
results in suboptimal performance. My experiment can be reproduced using
the artifact on my GitHub repository:
https://github.com/mraszyk/RBT_height
I believe it might help to track the black-node height of an RBT in each
branching node explicitly. It might also be beneficial to implement the
approach for red-black tree set operations discussed on Wikipedia [2] which
achieves a better asymptotic complexity and is claimed to be optimal in
terms of the number of comparisons. Hence, I would be interested in your
opinion on these optimizations and how they could be best accomplished. I
can offer my help there.
Best regards,
Martin
[1] Appel, A.W.: Efficient verified red-black trees (2011),
http://www.cs.princeton.edu/~appel/papers/redblack.pdf
[2]
https://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Set_operations_and_bulk_operations
From: Tobias Nipkow <nipkow@in.tum.de>
Dear Martin,
I think the right way to go is via the Join/Split approach discussed in your
wikipedia link. We do even have an implementation of that in
HOL/Data_Structures/Set2_Join_RBT.thy.
Tobias
smime.p7s
Last updated: Nov 21 2024 at 12:39 UTC