Comments on Bannalia: trivial notes on themes diverse: Cache-friendly binary search

Thanks for a great blogpost! I've implemented ...

2017-03-07T08:15:43.308+01:00

Thanks for a great blogpost!
I've implemented this layout as a generic STL-like associative container:
https://github.com/mikekazakov/eytzinger/
https://kazakov.life/2017/03/06/cache-friendly-associative-container/

Hi Fabio, Yes, this arrangement is called "E...

2016-10-26T10:10:37.769+02:00

Hi Fabio,

Yes, this arrangement is called "Eytzinger layout" in the academic literature. You can, for instance, follow the link below for more info and additional references:

https://arxiv.org/ftp/arxiv/papers/1509/1509.05053.pdf

Hi. I am writing a paper on vectorization (SIMD) o...

2016-10-25T17:52:22.912+02:00

Hi. I am writing a paper on vectorization (SIMD) of various algorithms to search in a sorted array. I like this approach and I would like to add it to the set of algorithm I am testing. Is this algorithm described in any journal publication? Fab

Wow, a great Interview question for senior develop...

2015-07-14T03:30:10.458+02:00

Wow, a great Interview question for senior developers ... Thanks JP

"less cache misses" -> "fewer ca...

2015-07-07T14:25:34.052+02:00

"less cache misses" -> "fewer cache misses"

Corrected, thank you.

And now we need to combine the two algorithms [...]

Yes, that would make for an interesting variation, I might try if time permits.

"less cache misses" -> "fewer ca...

2015-07-07T09:11:21.202+02:00

"less cache misses" -> "fewer cache misses"

And now we need to combine the two algorithms: levelorder_vector to start the search then switch to Ye Olde plain vector when the search radius falls below the cache size (or something like that, to be profiled).

> Seems like Blogger does not have that option....

2015-06-27T11:01:26.540+02:00

> Seems like Blogger does not have that option. Do you think otherwise?

Nope. Seems the only way is to delete and rewrite, as I'm doing for this one...

2015-06-27T10:59:36.207+02:00

This comment has been removed by the author.

Hi Chris, you're right traversing this structu...

2015-06-27T02:39:29.477+02:00

Hi Chris, you're right traversing this structure is bound to be slower than boost::container::flat_set, and I'd like to study this in detail in a future entry, if time permits. As for hash tables, they're definitely faster for lookup (check for instance this entry, where lookup times for MSVC are around 2-3x faster than shown here), but then you lose the ability to traverse the elements in ordered fashion. At the end of the day, data structures are about balancing trade-offs among the different usage scenarios one is interested in.

Thank you for the submission, just included in the...

2015-06-27T02:27:49.089+02:00

Thank you for the submission, just included in the article. I don't know what happened to your previous comment, it does't appear anywhere on my control panel (spam, waiting moderation, etc.)

I don't know if my previous comment made it, b...

2015-06-26T18:59:49.625+02:00

I don't know if my previous comment made it, but here are my results: https://gist.github.com/GrayShade/807e1efd4a42a47e1506 .

This comes at the cost of decreased in-order trave...

2015-06-26T14:49:09.308+02:00

This comes at the cost of decreased in-order traversal. The nice thing about flat_map is that it is ordered such that in-order traversal moves sequentially in memory. Not only cache friendly, but prefetchers love that.

That being said, maybe you don't need to do that often and you spend more time searching, but if that's the case, one should consider if ordering is needed at all, and if not, consider a hash table instead.

Are threaded comments enabled Joaquin? I've j...

2015-06-25T21:57:53.123+02:00

Are threaded comments enabled Joaquin?

I've just done so.

Also, it's possible to edit comments?

Seems like Blogger does not have that option. Do you think otherwise?

Someone else started looking into memory layouts f...

2015-06-25T18:53:11.770+02:00

Someone else started looking into memory layouts for binary search not too long ago.

Because virtual memory is a hardware feature. So t...

2015-06-25T18:39:28.937+02:00

Because virtual memory is a hardware feature. So the prefetching, caching, etc, works in virtual address space, not physical address space.

Are threaded comments enabled Joaquin? I'm not...

2015-06-25T17:23:06.409+02:00

Are threaded comments enabled Joaquin? I'm not sure how I can directly answer to Sudsy. Also, it's possible to edit comments?

@Sudsy typical L1 caches have lines 64 bytes wide. On most OSs VM pages are about 4KB. These are completely different beasts. Note the scale, VM locality does not affect cache since VM works in units orders of magnitude bigger.

One thing I've never understood about algorith...

2015-06-25T16:03:40.579+02:00

One thing I've never understood about algorithms that attempt to optimize for cache effects is how you could expect these algorithms to work when they take place in virtual memory locations, which can be mapped to unpredictable physical memory locations. Since the cache works on physical memory, I would think that many of the desirable performance improvements in such algorithms would go away if contiguous virtual memory is mapped to non-contiguous physical memory.

Arch Linux 3.16 x86_64 Intel Core i7 4790k @4.5GHz...

2015-06-25T14:51:04.038+02:00

Arch Linux 3.16 x86_64 Intel Core i7 4790k @4.5GHz 16GB RAM

GCC 5.1.0 -Wall -O3

std::set;boost::container::flat_set;levelorder_vector
10000;0.942178;0.649732;0.405604
12000;1.0163;0.661393;0.426551
14400;1.09829;0.673412;0.394561
17280;1.1708;0.68414;0.397729
20736;1.22752;0.696192;0.4574
24883;1.3168;0.70917;0.445425
29859;1.39845;0.719152;0.417932
35830;1.52342;0.732551;0.42978
42995;1.54287;0.74962;0.486241
51593;1.61979;0.758761;0.480406
61910;1.68448;0.776161;0.444349
74290;1.76935;0.809774;0.4911
89146;1.8292;0.832541;0.562394
106973;1.93549;0.865164;0.544891
128365;2.07622;0.891607;0.50524
154035;2.23172;0.924304;0.584387
184839;2.354;0.949735;0.678893
221803;2.66305;0.983547;0.641758
266159;2.94522;1.01617;0.635775
319386;3.49813;1.03448;0.771336
383258;3.94718;1.05866;0.847865
459904;4.07259;1.08642;0.929145
551879;4.16393;1.12095;0.917717
662249;5.14943;1.18708;0.857538
794693;5.98453;1.12341;0.848175
953625;6.41291;1.1347;0.830826
1144343;6.82237;1.15969;0.964873
1373204;7.1026;1.20294;1.03471
1647837;7.53148;1.21857;0.967981
1977396;8.0361;1.32692;1.0474
2372866;8.09712;1.41156;1.25784
2847430;8.53848;1.57566;1.74904

Clang 3.6.1 -Wall -O3 (gnu stdlibc++):

std::set;boost::container::flat_set;levelorder_vector
10000;0.993368;0.666357;0.630939
12000;1.07131;0.673273;0.635633
14400;1.13079;0.683148;0.643084
17280;1.213;0.693476;0.658646
20736;1.33367;0.705337;0.656572
24883;1.3532;0.7131;0.66242
29859;1.41948;0.720393;0.671182
35830;1.53041;0.734108;0.680093
42995;1.63618;0.744925;0.686951
51593;1.73344;0.757208;0.693145
61910;1.74293;0.775989;0.70711
74290;1.84671;0.818686;0.733187
89146;1.87382;0.839728;0.748626
106973;1.96041;0.859623;0.765006
128365;2.06532;0.901873;0.782139
154035;2.212;0.923579;0.79369
184839;2.36325;0.950378;0.804458
221803;2.63924;0.988126;0.831224
266159;3.43489;1.01654;0.838551
319386;3.42463;1.03174;0.847947
383258;3.94939;1.06131;0.86878
459904;4.35884;1.08742;0.875501
551879;4.77623;1.13165;0.891176
662249;5.46352;1.15979;0.901064
794693;6.04875;1.13259;0.914436
953625;6.60138;1.15176;0.925438
1144343;6.8115;1.16985;0.943092
1373204;7.16612;1.1905;0.958147
1647837;7.74718;1.22039;0.994402
1977396;8.00013;1.27861;1.03661
2372866;8.50959;1.41785;1.14635
2847430;9.1114;1.56756;1.26845

Clang 3.6.1 -Wall -O3 (LLVM libc++)

std::set;boost::container::flat_set;levelorder_vector
10000;1.20145;0.900542;0.858861
12000;1.21311;0.866906;0.82188
14400;1.24415;0.838789;0.79369
17280;1.46327;0.974148;0.935117
20736;1.47669;0.939277;0.888318
B24883;1.488;0.904221;0.848018
29859;1.53827;0.87751;0.821355
35830;1.78498;1.0131;0.949232
42995;1.72539;0.981349;0.913598
51593;1.76445;0.946652;0.886394
61910;1.7986;0.922748;0.864523
74290;2.04623;1.08704;0.994173
89146;2.15265;1.0754;0.961312
106973;2.14005;1.04145;0.934215
128365;2.25706;1.03695;0.9399
154035;3.4363;1.2096;1.06582
184839;3.12646;1.1852;1.0347
221803;3.5032;1.18023;1.00237
266159;3.7558;1.30898;1.12387
319386;3.51973;1.28644;1.0925
383258;3.62738;1.26987;1.06401
459904;4.03144;1.25995;1.07837
551879;4.82011;1.40394;1.18071
662249;4.98677;1.39467;1.13598
794693;5.61304;1.32911;1.10182
953625;6.1442;1.3126;1.07715
1144343;7.16668;1.44368;1.20948
1373204;6.92954;1.46197;1.23476
1647837;7.50992;1.41724;1.30037
1977396;7.88137;1.43963;1.17976
2372866;8.63361;1.69802;1.46556
2847430;8.58553;1.78567;1.52624

Interesting. GCC clearly wins the race, but note how Clang with stdlibc++ runs faster at the beginning, times get closer as the input grows, then libc++ wins at huge input sets.

Compiled with g++ 4.9.2 on Debian: g++ -Ofast -Wal...

2015-06-25T10:15:53.473+02:00

Compiled with g++ 4.9.2 on Debian:
g++ -Ofast -Wall -Wextra -Werror -std=c++11 -o test levelorder_vector.cpp

The machine is an Intel(R) Core(TM) i7-4790T CPU @ 2.70GHz with 16GB RAM

Binary search:
std::set;boost::container::flat_set;levelorder_vector
10000;1.10664;0.774499;0.533323
12000;1.18407;0.789164;0.555826
14400;1.27004;0.799802;0.54171
17280;1.34722;0.81454;0.530249
20736;1.43015;0.830834;0.594138
24883;1.50467;0.842233;0.596146
29859;1.5937;0.868157;0.566802
35830;1.75523;0.874971;0.5836
42995;1.83315;0.881442;0.647675
51593;1.89309;0.919203;0.639589
61910;1.94726;0.936893;0.614695
74290;2.19871;0.948165;0.651943
89146;2.07348;0.979815;0.735306
106973;2.221;1.01198;0.712115
128365;2.3946;1.058;0.692551
154035;2.4648;1.07647;0.851917
184839;2.80888;1.1326;0.908473
221803;3.40933;1.171;0.866847
266159;3.28181;1.1883;0.843519
319386;4.14999;1.2394;0.951587
383258;4.23496;1.27694;1.00523
459904;4.8146;1.28158;1.1303
551879;5.07588;1.37047;1.06402
662249;5.67157;1.40763;1.13232
794693;6.52337;1.426;1.52635
953625;6.65547;1.37775;1.16741
1144343;7.1171;1.4559;1.24296
1373204;7.27995;1.47189;1.37101
1647837;8.07396;1.88349;1.58652
1977396;8.25508;1.75467;1.60646
2372866;8.49498;1.84866;1.80135
2847430;9.44917;2.07216;2.32611

It has been a bit since I have updated my c++ libr...

2015-06-25T09:48:14.306+02:00

It has been a bit since I have updated my c++ libraries, don't mind the 0x heh

$ clang++ -v
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

$ sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz

$ clang++ preorder_vector.cpp -std=c++11 -O3
output_clang_11.txt

$ clang++ preorder_vector.cpp -std=c++0x -O3
output_clang_0x.txt

I did a series of runs with g++ 4.{6789} and 5.1 -...

2015-06-25T03:35:29.740+02:00

I did a series of runs with g++ 4.{6789} and 5.1 -O3 on a Intel Core i7 860 @ 2.80GHz with Linux kernel 4.0 from Debian.

The results can be found at http://charm.cs.illinois.edu/~phil/flat-btree/