Monday, November 24, 2008

Optimizing red-black tree color bits

C++ associative containers are almost universally implemented on top of a structure known as a red-black tree. An rb tree node has approximately the following look:

‎enum color_type{red=false,black=true};

‎template<typename T>
‎struct rb_node
‎{
‎ color_type color;
‎ rb_node* parent;
‎ rb_node* left;
‎ rb_node* right;
‎ T value;
‎};

The color of a node, from which the data structure receives its name, is used to implement some algorithms to keep the tree balanced. Note that the information conveyed by color is exactly one bit; yet, this member takes at least one char of storage and, in most platforms, much more than that because of alignment issues. Increasingly more libraries implement an optimization technique that is able to get rid of the extra storage demanded by color by embedding the information bit into one of the additional node fields. As this technique does not appear to be much documented on the Internet, this entry describes it in detail. Much of the material draws from the source code of Boost.MultiIndex. Some acquantaince with Boost.MPL is welcome to understand what follows.

To embed a bit into the representation of a pointer we need to use in place of the pointer an unsigned integer type with the same size. In most 32bit platforms this type is unsigned int, but the standard does not guarantee this, or even that such an integral type exists. We assume that we have the following utilities at our disposal:

‎typedef ... has_uintptr_type;
‎typedef ... uintptr_type;

where has_uintptr_type is a Boost.MPL boolean indicating whether an unsigned integral type exists with exactly the same size as a pointer, and if so uintptr_type is such type. We will provide later an implementation for these.

Alignment considerations dictate that many types must lie in memory addresses that are multiple of some small integer greater than 1; in particular, it is extremely usual that the alignment of a type T is even (i.e. a multiple of 2), in which case the least significant bit of the representation of pointers to T is always zero. And this is precisely the situation in which we can reuse this bit to embed the color information:

‎template<typename T,bool ColorBitCompression=true>
‎class rb_node:
‎ public boost::mpl::if_c<
‎ ColorBitCompression&&
‎ has_uintptr_type::value&&
‎ (boost::alignment_of<compressed_rb_node_header>::value%2==0),
‎ compressed_rb_node_header,
‎ regular_rb_node_header
‎ >::type
‎{
‎ T value_;

‎public:
‎ T& value(){return value_;}
‎ const T& value()const{return value_;}
‎};

rb_node uses a header with the color bit impression technique when:

  1. This is requested by the user via ColorBitCompression.
  2. There is an unsigned integer with which pointers can be represented.
  3. The alignment of headers is even.

Otherwise a regular header is used. The implementation of the latter is entirely obvious:

‎class regular_rb_node_header
‎{
‎ typedef regular_rb_node_header* pointer;

‎ color_type color_;
‎ regular_rb_node_header* parent_;
‎ regular_rb_node_header* left_;
‎ regular_rb_node_header* right_;

‎public:
‎ color_type& color(){return color_;}
‎ color_type color()const{return color_;}
‎ pointer& parent(){return parent_;}
‎ pointer parent()const{return parent_;}
‎ pointer& left(){return left_;}
‎ pointer left()const{return left_;}
‎ pointer& right(){return right_;}
‎ pointer right()const{return right_;}
‎};

As for compressed_rb_node_header, the most interesting part lie in the use of proxy classes color_ref and parent_ref to isolate the user from the representation of the color information and the parent pointer as a merged uintptr_type:

‎class compressed_rb_node_header
‎{
‎ typedef compressed_rb_node_header* pointer;
‎ struct color_ref
‎ {
‎ color_ref(uintptr_type* r_):r(r_){}

‎ operator color_type()const
‎ {
‎ return color_type(*r&uintptr_type(1));
‎ }

‎ color_ref& operator=(color_type c)
‎ {
‎ *r&=~uintptr_type(1);
‎ *r|=uintptr_type(c);
‎ return *this;
‎ }

‎ color_ref& operator=(const color_ref& x)
‎ {
‎ return operator=(x.operator color_type());
‎ }

‎ private:
‎ uintptr_type* r;
‎ };
‎ struct parent_ref
‎ {
‎ parent_ref(uintptr_type* r_):r(r_){}

‎ operator pointer()const
‎ {
‎ return (pointer)(void*)(*r&~uintptr_type(1));
‎ }

‎ parent_ref& operator=(pointer p)
‎ {
‎ *r=((uintptr_type)(void*)p)|(*r&uintptr_type(1));
‎ return *this;
‎ }

‎ parent_ref& operator=(const parent_ref& x)
‎ {
‎ return operator=(x.operator pointer());
‎ }

‎ pointer operator->()const
‎ {
‎ return operator pointer();
‎ }

‎ private:
‎ uintptr_type* r;
‎ };

‎ uintptr_type parentcolor_;
‎ pointer left_;
‎ pointer right_;

‎public:
‎ color_ref color(){return color_ref(&parentcolor_);}
‎ color_type color()const{return color_type(parentcolor_&std::size_t(1ul));}
‎ parent_ref parent(){return parent_ref(&parentcolor_);}
‎ pointer parent()const
‎ {return (pointer)(void*)(parentcolor_&~uintptr_type(1));}
‎ pointer& left(){return left_;}
‎ pointer left()const{return left_;}
‎ pointer& right(){return right_;}
‎ pointer right()const{return right_;}
‎};

A complete example program is provided. In typical 32bit architectures, color bit compression makes the size of rb_node<T> reduce by 4 bytes.

This color bit compression technique introduces some penalty when accessing and modifying the color and parent information. This penalty, however, is entirely negligible. What is more, the reduction of size brought about by color bit compression usually results in a performance speedup for associative containers taking advantage of this trick: as nodes are smaller, more of them can be stored in L1 cache, which accelerates execution noticeably.

One final remark about this technique: as the parent pointer is no longer declared as a regular pointer, but rather represented by a masked integer, most C++ garbage collectors will not recognize the implicit pointer, which in principle can cause problems. All is fine, however, as it is only the parent that is masked: the right and left pointers are stored as genuine pointers, and as every node of a red/black tree is reachable only through left and right garbage collectors will not fail.

Thank you to Thorsten Ottosen for discussing this color bit compression technique with me.

3 comments :

  1. Nice article Joaquín. This technique can also be applied in AVL trees (Boost.Intrusive) but you need 2 bits to store balance information. Since in 32 bit architectures pointers are usually properly aligned, compression is also applicable.

    ReplyDelete
  2. Joaquin, I like a lot your articles. Respect to this one, I think we can propose a library that will encapsulate all this stuff. The idea is to define generic pointer_ref and bit_ref template classes. Define compressed_bit_pointer and regular_bit_pointer template classes and a selector class based on your conditions. In this way the user will not need to define himself the compressed and regular classes.

    I have started a thread on the Boost ML. Please join it.

    ReplyDelete
  3. 1) If you use an uintptr_type instead of an uintptr_type*, this can be accomplished much more easily using bit fields: https://gist.github.com/anonymous/0fb5a5d0c98192131476 -- you just need to dereference in parent_ref, whereas color_ref is no longer needed. The compiler does the rest for you.

    2) On AMD64, the current version of the standard specifies that virtual addresses are 48 bits long. This means that you can safely* use the 16 most significant bits in a pointer to store other stuff. Plenty of space!

    * until newer machines come out that use more than that....

    ReplyDelete