Thursday, April 4, 2024

A case in API ergonomics for ordered containers

Suppose we have a std::set<int> and would like to retrieve the elements between values a and b, both inclusive. This task is served by operations std::set::lower_bound and std::set::upper_bound:

std::set<int> x=...;

// elements in [a,b]
auto first = x.lower_bound(a);
auto last = x.upper_bound(b);

while(first != last) std::cout<< *first++ <<" ";

Why do we use lower_bound for the first iterator and upper_bound for the second? The well-known STL convention is that a range of elements is determined by two iterators first and last, where first points to the first element of the range and last points to the position right after the last element. This is done so that empty ranges can be handled without special provisions (first == last).

Now, with this convention in mind and considering that

  • lower_bound(a) returns an iterator to the first element not less than a,
  • upper_bound(b) returns an iterator to the first element greater than b,

we can convince ourselves that the code above is indeed correct. The situations where one or both of the interval endpoints are not inclusive can also be handled:

// elements in [a,b)
auto first = x.lower_bound(a);
auto last  = x.lower_bound(b);

// elements in (a,b]
auto first = x.upper_bound(a);
auto last  = x.upper_bound(b);

// elements in (a,b)
auto first = x.upper_bound(a);
auto last  = x.lower_bound(b);

but getting them right requires some thinking.

Boost.MultiIndex introduces the operation range to handle this type of queries:

template<typename LowerBounder,typename UpperBounder>
std::pair<iterator,iterator>
range(LowerBounder lower, UpperBounder upper);

lower and upper are user-provided predicates that determine whether an element is not to the left and not to the right of the considered interval, respectively. The formal specification of LowerBounder and UpperBounder is quite impenetrable, but using this facility, in particular in combination with Boost.Lambda2, is actually straightforward:

// equivalent to std::set<int>
boost::multi_index_container<int> x=...;

using namespace boost::lambda2;

// [a,b] auto [first, last] = x.range(_1 >= a, _1 <= b);
// [a,b) auto [first, last] = x.range(_1 >= a, _1 < b); // (a,b] auto [first, last] = x.range(_1 > a, _1 <= b); // (a,b) auto [first, last] = x.range(_1 > a, _1 < b);

The resulting code is much easier to read and to get right in the first place, and is also more efficient than two separate calls to [lower|upper]_bound   (because the two internal rb-tree top-to-bottom traversals can be partially joined in the implementation of range). Just as importantly, range handles situations such as this:

int a = 5;
int b = 2; // note a > b

// elements in [a,b]
auto first = x.lower_bound(a);
auto last = x.upper_bound(b);

// undefined behavior
while(first != last) std::cout<< *first++ <<" ";

When a > b, first may be strictly to the right of last, and consequently the while loop will crash or never terminate. range, on the other hand, handles the situation gracefully and returns an empty range.

We have seen an example of how API design can help reduce programming errors and increase efficiency by providing higher-level facilities that model and encapsulate scenarios otherwise served by a combination of lower-level operations. It may be interesting to have range-like operations introduced for standard associative containers.