We do a formal analysis of the cache-related behavior of traversing a levelorder_vector, a linear array of elements laid out according to the breadth-first order of a binary tree. Ignoring roundoff effects, let us assume that the array has

*n*= 2^{d}− 1 elements and the CPU provides a cache system of*N*lines capable of holding*L*contiguous array elements each. We further assume that*N*>*d*.
Under these conditions, the elements of level

*i*, being contiguous, are paged to 2^{i}/*L*lines.
The traversal of the array corresponds to the post-order sequence of the tree implicitly encoded, and it is easy to see that this meets the elements of each level

*i*exactly in the order they are laid out in memory: so, given that*N*>*d*, i.e. the CPU can hold at least as many lines in cache as there are levels in the tree, an optimal caching scheduler will result in only 2^{i}/*L*misses per level, or*n*/*L*overall. The number of cache misses*per element*during traversal is then constant and equal to 1/*L*, which is the minimum attainable for*any*traversal scheme (like the simplest one consisting of walking linearly the array from first to last).
The questions remains of whether we can justifiably hold the assumption that

*N*>*d*in modern architectures. In the Intel Core family, for instance, the data L1 cache is 32 KB (per core) and line size is 64 B, so*N*= 512. On the other hand, the maximum*d*possible in 64-bit architectures is precisely 64, therefore the condition*N*>*d*is met indeed by a large margin.
## No comments :

## Post a Comment