Bannalia: trivial notes on themes diverse: 2025

(From a conversation with Vassil Vassilev.) Suppose we want to have a C++ map where the keys are disjoint, integer intervals of the form [a, b]:

struct interval
{
  int min, max;
};

std::map<interval, std::string> m;

m[{0, 9}] = "ABC";
m[{10, 19}] = "DEF";

This looks easy enough, we just have to write the proper comparison operator for intervals, right?

bool operator<(const interval& x, const interval& y)
{
  return x.max < y.min;
}

But what happens if we try to insert an interval which is not disjoint with those already in the map?

m[{5, 14}] = "GHI"; // intersects both {0, 9} and {10, 19}

The short answer is that this is undefined behavior, but let's try to undertand why. C++ associative containers depend on the comparison function (typically, std::less<Key>) inducing a so-called strict weak ordering on elements of Key. In layman terms, a strict weak order < behaves as the "less than" relationship does for numbers, except that there may be incomparable elements x, y such that x ≮ y and y ≮ x; for numbers, this only happens if x = y, but in the case of a general SWO we allow for distinct, incomparable elements as long as they form equivalence classes. A convenient way to rephrase this condition is to require that incomparable elements are totally equivalent in how they compare to the rest of the elements, that is, they're truly indistinguishable from the point of view of the SWO. Getting back to our interval scenario, we have three possible cases when comparing [a, b] and [c, d]:

If b < c, the intervals don't overlap and [a, b] < [c, d].
If d < a, the intervals don't overlap and [c, d] < [a, b].
Otherwise, the intervals are incomparable. This can happen when [a, b] and [c, d] overlap partially or when they are exactly the same interval.

What we have described is a well known relationship called interval order. The problem is that the interval order is not a strict weak order. Let's depict a Hasse diagram for the interval order on integer intervals [a,b] between 0 and 4:

A Hasse diagram works like this: given two elements x and y, x < y iff there is a path going upwards that connects x to y. For instance, the fact that [1, 1] < [3, 4] is confirmed by the path [1, 1] → [2, 2] → [3, 4]. But the diagram also serves to show why this relationship is not a strict weak order: for it to be so, incomparable elements (those not connected) should be indistinguishable in that they are connected upwards and downwards with the same elements, and this is clearly not the case (in fact, it is not the case for any pair of incomparable elements). In mathematical terms, our relationship is of a more general type called a strict partial order.

Going back to C++, associative containers assume that the elements inserted form a linear arrangement with respect to <: when we try to insert a new element y that is incomparable with some previously inserted element x, the properties of strict weak orders allows us to determine that x and y are equivalent, so nothing breaks up (the insertion fails as a duplicate for a std::map, or y is added next to x for a std::multimap).

There's a way to accommodate our interval scenario with std::map, though. As long as the elements we are inserting belong to the same connecting path or chain, std::map can't possibly "know" if our relationship is a strict weak order or not: it certainly looks like one for the limited subset of elements it has seen so far. Implementation-wise, we just have to make sure we're not comparing partially overlapping intervals:

struct interval_overlap: std::runtime_error
{
  interval_overlap(): std::runtime_error("interval overlap"){}
};

bool operator<(const interval& x, const interval& y)
{
  if(x.min == y.min) {
    if(x.max != y.max) throw interval_overlap();
    return false;
  }
  else if(x.min < y.min) {
    if(x.max >= y.min) throw interval_overlap();
    return true;
  }
  else /* x.min > y.min */
  {
    if(x.min <= y.max) throw interval_overlap();
    return false;
  }
}

std::map<interval, std::string> m;

m[{0, 9}] = "ABC";
m[{10, 19}] = "DEF";
m[{5, 14}] = "GHI"; // throws interval_overlap

So, when we try to insert an element that would violate the strict weak ordering constraints (that is, it lies outside the chain connecting the intervals inserted so far), an exception is thrown and no undefined behavior is hit. A strict reading of the standard would not allow this workaround, as it is required that the comparison object for the map induce a strict weak ordering for all possible values of Key, not only those in the container (or that is my interpretation, at least): for all practical purposes, though, this works and will foreseeably continue to work for all future revisions of the standard.

Bonus point. Thanks to heterogeneous lookup, we can extend our use case to support lookup for integers inside the intervals:

struct less_interval
{
  using is_transparent = void;

  bool operator()(const interval& x, const interval& y) const
  {
    // as operator< before
  }

  bool operator()(int x, const interval& y) const
  {
    return x < y.min;
  }
  
  bool operator()(const interval& x, int y) const
  {
    return x.max < y; 
  }    
};

std::map<interval, std::string, less_interval> m;
  
m[{0, 9}] = "ABC";
m[{10, 19}] = "DEF";

std::cout << m.find(5)->second << "\n"; // prints "ABC"

Exercise for the reader: Can you formally prove that this works? (Hint: define a strict weak order on ℕ ∪ I, where ℕ is the set of natural numbers and I is a collection of disjoint integer intervals.)

Bannalia: trivial notes on themes diverse

Sunday, July 6, 2025

Maps on chains