After
Microsoft Visual Studio 2012 and
GCC 4.8.2, we now provide the results of our performance test suite for C++ unordered associative containers in Clang 3.4. All the tests were compiled and run by Thomas Goyne on a Mac OS X 10.9.1 box with an Intel Core i7-3720QM @2.6GHz,
Turbo Boost and
SpeedStep disabled, release settings
-O3 -std=c++11.
Clang can work with two different standard libraries,
libstdc++-v3 and
libc++: we have tried both libstdc++ 4.8.2 and libc++ 3.4.
libstdc++-v3
 |
Insertion without rehashing, unique elements. |
 |
Insertion with rehashing, unique elements. |
 |
Insertion without rehashing, duplicate elements, Fmax = 1, G = 5. |
 |
Insertion without rehashing, duplicate elements, Fmax = 5, G = 5. |
 |
Insertion with rehashing, duplicate elements, Fmax = 1, G = 5. |
 |
Insertion with rehashing, duplicate elements, Fmax = 5, G = 5. |
 |
Erasure, unique elements. |
 |
Erasure, duplicate elements, Fmax = 1, G = 5. |
 |
Erasure, duplicate elements, Fmax = 5, G = 5. |
 |
Successful lookup, unique elements. |
 |
Unsuccessful lookup, unique elements. |
 |
Successful lookup, duplicate elements, Fmax = 1, G = 5. |
 |
Unsuccessful lookup, duplicate elements, Fmax = 1, G = 5. |
 |
Successful lookup, duplicate elements, Fmax = 5, G = 5. |
 |
Unsuccessful lookup, duplicate elements, Fmax = 5, G = 5. |
As expected, the results are very similar to those obtained with GCC: the only noteworthy difference is Boost.Unordered performance in lookup with unique elements, which is somewhat poorer here —the reasons for this discrepancy are not clear to me.
libc++
 |
Insertion without rehashing, unique elements. |
 |
Insertion with rehashing, unique elements. |
 |
Insertion without rehashing, duplicate elements, Fmax = 1, G = 5. |
 |
Insertion without rehashing, duplicate elements, Fmax = 5, G = 5. |
 |
Insertion with rehashing, duplicate elements, Fmax = 1, G = 5. |
 |
Insertion with rehashing, duplicate elements, Fmax = 5, G = 5. |
 |
Erasure, unique elements. |
 |
Erasure, duplicate elements, Fmax = 1, G = 5. |
 |
Erasure, duplicate elements, Fmax = 5, G = 5. |
 |
Successful lookup, unique elements. |
 |
Unsuccessful lookup, unique elements. |
 |
Successful lookup, duplicate elements, Fmax = 1, G = 5. |
 |
Unsuccessful lookup, duplicate elements, Fmax = 1, G = 5. |
 |
Successful lookup, duplicate elements, Fmax = 5, G = 5. |
 |
Unsuccessful lookup, duplicate elements, Fmax = 5, G = 5. |
In general, the unordered associative containers provided by libc++ are considerably slower than those in libstdc++-v3:
- When duplicate elements are allowed, insertion of a new element in libc++ takes place at the end of the range of equivalent elements (in libstdc++ -v3 it is at the beginning), which incurs the extra cost of traversing the range.
- Erasure is unnecessarily expensive: to determine whether a given node p is located at the beginning of its bucket b, libc++ does a hash→bucket mapping on the predecessor prev of x and checks if the resulting bucket is not the same as b, when it suffices to verify if b points to prev.
- It is harder to explain why lookup (specially with unique elements) is so much slower than in libstdc++-v3: code inspection reveals that the hash→bucket mapping is less efficient in libc++ (it involves one extra logical and operation), but this fact alone seems unlikely to account for the big differences in performance.
I would like the benchmarking code; as I am working on special case (very large data-sets) hash-map implementation.
ReplyDeleteDave
The links are already available scattered across the different blog entries I devoted to this. For your convenience:
ReplyDeleteunique_running_insertion.cpp
non_unique_running_insertion.cpp
unique_scattered_erasure.cpp
non_unique_scattered_erasure.cpp
unique_scattered_lookup.cpp
non_unique_scattered_lookup.cpp
Thanks for these links.
ReplyDeleteDave