Saturday, April 12, 2014

Climbing down the tree: framework and performance

Continuing with our study on optimizing lexicographical comparison as used within binary search algorithms, we will provide now a fuller description of the supporting C++ framework and measure the actual performance with several test sequences on different environments.
Let Compare be a binary relation inducing a strict weak ordering on values of a type T, cmp a value of Compare and x a value of T. A binary search binder for cmp and x is any object cbind for which the expressions cbind.less(y) and cbind.greater(y) are valid whenever cmp(y,x) and cmp(x,y) are valid, respectively, and return the same boolean value as these. Moreover, when the range being searched is sorted (i.e. not merely partitioned with respect to x) the implementation of a binary search binder can assume the following usage restrictions:
  • If either the invocation cbind.less(y) returns true or the invocation cbind.greater(y) returns false, subsequent invocations on cbind will be done against values not less than y under the ordering relation induced by Compare.
  • If either the invocation cbind.less(y) returns false or the invocation cbind.greater(y) returns true, subsequent invocations on cbind will be done against values not greater than y.
Standard C++ binary search algorithms such as lower_bound or binary_search can be implemented by replacing each usage of the given Compare object with the equivalent invocation of a single associated binder obtained through comp_bind, which is defined as:
template<typename Comp,typename T>
struct comp_binder
{
  comp_binder(Comp cmp,const T& x):cmp(cmp),x(x){}

  template<typename Q> bool less(const Q& y){return cmp(y,x);}
  template<typename Q> bool greater(const Q& y){return cmp(x,y);}

  Comp     cmp;
  const T& x;
};

template<typename Comp,typename T>
comp_binder<Comp,T> comp_bind(Comp cmp,const T& x)
{
  return comp_binder<Comp,T>(cmp,x);
}
(Note that comp_binder is compatible even with polymorphic comparison types such as C++14 transparent operator functors.) While providing backwards compatibility, this framework introduces new customization opportunities via specialization of comp_binder; this is precisely what prefix_less does:
template<typename T>
struct prefix_less:std::less<T>
{
};

template<typename T>
struct comp_binder<prefix_less<T>,T>
{
  bool less(const T& y);
  bool greater(const T& y);
};
T must be either an instantiation of std::basic_string or a container with lexicographical less-than semantics. prefix_less<T> is functionally equivalent to std::less<T>, but its associated binder takes advantage of the special binary search context where it is being used to provide better algorithmic complexity. The actual implementation can be consulted in the test program provided below. From the point of view of the end user, prefix_less works exactly as std::less:
std::vector<std::string> v;
...
auto it=std::lower_bound(
  v.begin(),v.end(),x,prefix_less<std::string>());
except that, if the algorithm internally uses comp_bind, the resulting performance might improve —or degrade, we will see now how the real thing behaves.
Measurements
I have written a program that compares the execution speeds of std::less and prefix_less with comp_bind-compliant versions of lower_bound and binary_search under the following testing scenarios:
template<typename Sequence,typename Compare>
void run_lower_bound(const Sequence& s,Compare cmp)
{
  for(const auto& x:s){
    lower_bound(s.begin(),s.end(),x,cmp);
  }
}

template<typename Sequence,typename Compare>
void run_binary_search(const Sequence& s,Compare cmp)
{
  for(const auto& x:s){
    binary_search(s.begin(),s.end(),x,cmp);
  }
}
An assortment of sequences is tested (in what follows SN,L denotes the sorted range of strings of length L composed with the N characters '0', '1', ...; for instance, S2,3 is the range ["000", "001", "010", "011", "100", "101", "110", "111"]):
  • S2,4 (size: 16), S2,10 (size: 1,024), S2,20 (size: ~1,000,000), S10,4 (size: 10,000), S10,6 (size: 10,000,000), S20,4 (size: 160,000),
  • The sorted set of ~23,000 unique words within Cervantes' Don Quijote de la Mancha
Several representation types are also used to encode the elements of these sequences:
  • std::string,
  • std::wstring,
  • std::vector<udchar>, where udchar is a simple wrapper around char,
  • std::vector<std::string>, i.e. each character is codified as a separate string of length 1.
MSVC
Tests were built with Microsoft Visual Studio 2012 using default release mode settings and run on a Windows box with an Intel Core i5-2520M CPU @2.50GHz. Depicted values are microseconds/n, where n is the number of elements in the sequence tested.
Execution times, std::string.
Execution times, std::wstring.
Execution times, std::vector<udchar>.
Execution times, std::vector<std::string>.
It is hard to beat the default implementation of lexicographical comparison for such simple types as std::string, which usually reduces to a call to the extremely efficient std::memcmp, and additionally the theoretical performance gain of prefix_less is dwarfed by the constant bookkeeping cost associated to tracking the length of the discarded common prefix. With these considerations in mind, prefix_less performs only marginally better than std::less when average string length is small and the representation type is simple, and excels in the opposite situations. std::vector<udchar> is an anomaly for which I do not have a clear explanation: udchar not being a memcmp-compatible type, one would expect prefix_less to behave unconditionally better than std::less just as with std::vector<std::string>, but in fact this is not the case; maybe there is some inlining threshold met by prefix_less that the simpler code of std::less manages to avoid.
GCC
Thomas Goyne has built the tests with GCC 4.9 20140330, release settings -O3 -std=c++11, and run them on an OS X 10.9.1 box with an Intel Core i7-3720QM @2.6GHz.
Execution times, std::string.
Execution times, std::wstring.
Execution times, std::vector<udchar>.
Execution times, std::vector<std::string>.
Unlike with MSVC, here prefix_less performs consistently better than std::less except for S2,4, where string length is too small to compensate for the extra constant cost incurred by prefix_less. Reductions in execution times range from 10% to 50%.
Clang
Results also provided by Thomas Goyne: Clang 503.0.38 (3.4svn) with libc++, release settings -O3 -std=c++11, same OS X machine as with GCC.
Execution times, std::string.
Execution times, std::wstring.
Execution times, std::vector<udchar>.
Execution times, std::vector<std::string>.
Performance gains are similar to those obtained in the case of GCC except when the representation type is std::vector<udchar>, in which case std::less is generally faster, again for reasons for which I do not have a satisfactory explanation.
Conclusions
C++ binary search algorithms can be extended in a backwards compatible way to support mechanisms for lexicographical comparison with lower complexity and, generally speaking, better performance than provided by regular std::less instantiations. Although we have focused on comparison of containers and strings, there is also an additional category of entities, namely tuples and similar types, that could also benefit from this framework: this we might explore in future entries.

Saturday, April 5, 2014

Climbing down the tree

Within the execution of binary search through a range of strings, the algorithmic complexity of lexicographical comparison, which is almost independent of the length of the strings in the general case, turns out here to degrade to O(length of strings). We want to devise a mechanism to restore the efficiency of lexicographical comparison by introducing some contextual information on the binary search execution back into the comparison routine.
Binary search can be regarded as the process of climbing down a binary tree whose nodes are the elements of the sorted range being searched through:
Say we are looking for a given value x and denote by x1, x2, ... the sequence of elements being compared against x during the process. The following is a fundamental property of partition-based algorithms such as binary search:
Property. For each xi visited during the process
  • if x < xi then xj < xi for all j > i,
  • if x > xi then xj > xi for all j > i.
So, searching zeroes in on x by fencing it into a narrowing interval [li, ui] ("l" and "u" stand for lower and upper bound, respectively) that is implicitly calculated as
  • [l0, u0] = [smallest string, largest string],
  • if x < xi then [li, ui] = [li-1, si],
  • if x > xi then [li, ui] = [si, ui-1].
The key observation here in connection with lexicographical comparison is: if, at stage i, li and ui are strings sharing a common prefix pi, then all subsequent comparisons will be done against strings beginning with this same prefix. An intelligent exploitation of this fact spares us then the need to check the prefix so that we can start the comparison at mid-string position.
Let us put the idea in C++. The canonical implementation of std::lower_bound looks like this:
template<typename ForwardIterator,typename T,typename Compare>
ForwardIterator lower_bound(
  ForwardIterator first,ForwardIterator last,
  const T& x,Compare comp)
{
  ForwardIterator it;
  typename std::iterator_traits<
    ForwardIterator>::difference_type count,step;
  count=std::distance(first,last);
  while(count>0){
    it=first;
    step=count/2;
    std::advance(it,step);
    if(comp(*it,x)){
      first=++it;
      count-=step+1;
    }
    else count=step;
  }
  return first;
}
where comp is a functor comparing T values for inequality (typically std::less<T>). This interface does not serve our purposes because, when invoking comp(x,y), we are not indicating which argument is the element being searched and which the range element compared against. So let us tweak the code a bit to make this information explicit:
{
  ...
  auto cbind=comp_bind(comp,x);
  while(count>0){
    it=first;
    step=count/2;
    std::advance(it,step);
    if(cbind.less(*it)){
      first=++it;
      count-=step+1;
    }
    else count=step;
  }
  return first;
}
We will not delve now into the implementation of the framework comp_bind relies on: suffice it to say that it passes x to a comparison object with the following interface: 
template<typename Comp,typename Value>
struct comp_binder
{
  comp_binder(const Comp& cmp,const Value& x);

  bool less(const Value& y);    // ~ cmp(y,x)
  bool greater(const Value& y); // ~ cmp(x,y)
};
This is then a mere change of interface with respect to that of std::less, but one that allows us to keep relevant contextual information on the execution of lower_bound within the comparison object itself, information that an optimized lexicographical comparison object can take advantage of:
template<typename T>
struct comp_binder<prefix_less<T>,T>
{
  comp_binder(const prefix_less<T>&,const T& x):
    x(x),pref_left(0),pref_right(0)
  {}

  bool less(const T& y)
  {
    auto n=std::min(pref_left,pref_right);
    auto m=std::min(x.size(),y.size());
    auto it1=x.begin()+n,it2=y.begin()+n;
    for(;n!=m&&*it1==*it2;++n,++it1,++it2);
    return (n!=m?*it2<*it1:n<x.size())?
      (pref_left=n,true):
      (pref_right=n,false);
  }

  bool greater(const T& y); // similar to less

  const T&              x;
  typename T::size_type pref_left,pref_right;
};
The object keeps in pref_left the length of the common prefix between the latest lower bounding element and x, and similarly with pref_right for upper bounding elements. By the fundamental properties of partition based algorithms, it is guaranteed that further comparisons will be done only against strings sharing with x a common prefix of length ≥ min{pref_left, pref_right}, a fact we can use to shorten the checking loop. 
I have put together an instrumented version of this scheme into a test program that calculates the resulting complexity of optimized lexicographical comparison for the sorted range of the NL strings of length L constructible with an alphabet of N symbols, denoted in what follows by E[CoptN,L]:
E[CoptN,L] as a function of L.
Remember that E[CN,L] is the average number of symbols checked when strings are randomly extracted from the range and E[C'N,L] is the corresponding value in the case that comparison happens within binary search. We have then
E[CN,L] ≤ E[CoptN,L] ≤ E[C'N,L],
E[CoptN,L] is bounded as N, L → ∞
(we won't prove this); so, we have restored O(1) complexity for lexicographical comparison, though the figures are still around two times higher than in the "free-floating" scenario.
In a later entry we will elaborate on the definition of comp_bind to support a backwards-compatible framework for less-based standard binary search algorithms and will also measure actual execution times with and without this optimization.

Wednesday, April 2, 2014

Complexity of lexicographical comparison: part II

Let SN,L the equiprobable sample space of strings of length L 1 over an alphabet A of N ≥ 2 symbols. For instance, S2,3 with A = {a, b} (which particular symbols A consists of is immaterial to our discussion) runs over the strings
aa, ab, ba, bb,
each with probability 1/4. According to the representation model introduced in an earlier entry, SN,L is characterized by
L(n) = 0, n < L,
L(n) = 1, n L,
T(p,q) = (1 - T(p,0))/N, pA*,
and the average number of steps it takes to lexicographically compare two independent outcomes of S, which we proved for the general case to be
E[C] =  Σn ≥ 0 (1/N)n(1- L(n - 1))2,
reduces here to
E[CN,L] = Σ0 ≤ nL (1/N)n = (NL+1 - 1)/(NL+1 - NL),
tending to N/(N - 1) as L → ∞. The figure shows E[CN,L] as a function of L for various values of N.
E[CN,L] as a function of L.
But lexicographical comparison does not perform so well in other, very common scenarios. Suppose we form a sequence s =(s1, ..., sNL) with the sorted values of SN,L and perform a binary search on it of some si extracted at random:
bs(si, s).
This operation does approximately log2 N comparisons between strings in s. We want now to calculate the average number of steps (i.e. symbols checked) these comparisons take, which we denote by E[C'N,L]. A simple C++ program exercising std::lower_bound helps us obtain the figures:
E[C'N,L] as a function of L.
E[C'N,L] is indeed different to E[CN,L] and in fact grows linearly with the length of the strings in s. The reason is that the algorithm for searching si iteratively touches on strings more and more similar to si, that is, sharing an increasingly longer common prefix with si, which imposes a penalty on lexicographical comparison. We can make a crude estimation analysis of E[C'N,L]: as each step of binary search gains one extra bit of information, the common prefix grows by 1/(log2 N) symbols per step, yielding an average common prefix length per comparison of
1 ≤ n ≤ (log2N ) - 1 n/(log2 N))/(log2 N) =  (1/2)(L - 1/(log2 N))
to be added an additional term 1 ≤ c < N/(N - 1) accounting for the comparison of the remaining string suffixes.
Lexicographical comparison as used in binary searching is then a rather inefficient O(length of strings). We will see in a later entry how to improve complexity by incorporating contextual information to the execution of the search algorithm.

Monday, March 31, 2014

Already/still

"Already" refers to some condition C that has begun in the past and is still in effect, in practical terms or with regard to its impact on the speaker:
"Still", on the other hand, is relative to a condition currently ongoing but bound to finish at some point in the future:
So, it would be odd to qualifty a given event with both adverbs in the same sentence,
because each would claim the statement focus: one is hard-pressed to find real-life examples of this configuration. 
This evening I went out for dinner at around 9:00PM or so. As daylight saving time has just kicked in in Spain, yesterday was the first time in the year that the setting sun can still be seen hiding behind rooftops at such a late hour. I exclaimed:
¡Ya es todavía de día!
which could be roughly translated to:
There's already still daylight!
So there you have it, "already" and "still" in the same sentence. In reality, the temporal structure of the sentence is not like depicted above, but rather the following:
C is the moment daylight time saving goes into effect, and C' the situation where the sun is out, that is, between dawn and dusk.The funny thing about the statement is that "already" refers to C and "still" to C', a sort of multi-scale sentence. A clearer paraphrasing reveals this:
The day has already come when there's still daylight this late!
I very clearly remember one early spring evening when I was around around nine years old: those days I was taking extra-class music lessons and came back home late, in total darkness, but that very Monday when I approached the block the sun was still pouring an afterglow of dusty golden light over the horizon line. I think this was the first time I realized days shrink and grow during the year; I guess that Monday was the first one in daylight saving time (which I knew nothing about back then) and shifting had just happened the day before, so making the effect of ever expanding sunlight much more noticeable.

Friday, March 28, 2014

Complexity of lexicographical comparison

Given two strings s, s', determining whether s < s' under a lexicographical order takes a number of steps 1  C ≤ 1 + min{len(s),len(s')}, but the actual statistical distribution of C depends of course on those of the strings being compared: if they tend to share the same prefix, for instance, C will be typically larger than in the case where the strings are more randomly generated.
Let S be a probability distribution over A*, the set of strings composable with symbols from a finite alphabet A with N symbols. A convenient way to describe S is by way of Markov chains:
Strings are generated from the empty prefix Λ by the addition of new symbols from A, termination being conventionally marked by transitioning to some symbol 0 not in A. Each arrow has an associated transition probability T(p,q), pA*, qA ∪ {0}. The probability that a string has prefix p = p1p2...pn is then
P(p) = P(Λ)T(Λ, p1)T(p1, p2)T(p1p2p3) ··· T(p1p2···pn-1, pn),
with P(Λ) = 1. Now, comparing two strings independently extracted from A* is equivalent to analyzing their transition paths until the first divergence is found. If we denote by C(n) the probability that such comparison takes exactly n steps it is easy to see that
C(n) =Σp An-1 P2(p)(1 - Σq A T2(p,q)),
which corresponds to the probability that the strings coincide up to the (n-1)-th symbol and then either differ on the n-th one or both terminate (that is, they don't transition to the same non-terminating symbol).
A particularly simple family of distributions for S are those where the lenghts of strings are governed by a random variable L with cumulative distribution function L(n) = Pr(L n) and non-terminating symbols occur equally likely, that is, for pAn, qA we have:
T(p,0) = (L(n) - L(n - 1))/(1- L(n - 1)),
T(p,q) = (1 - T(p,0))/N = (1/N)(1 - L(n))/(1- L(n - 1)),
 P(p) = Πi = 0,...,n-1 (1/N)(1 - L(i))/(1- L(i - 1)) = (1/N)n (1 - L(n - 1)),
which gives
C(n) =  Nn-1·(1/N)2(n-1)(1- L(n - 2))2(1 - N·(1/N)2(1 - L(n - 1))2/(1- L(n - 2))2) =
=  (1/N)n-1(1- L(n - 2))2 - (1/N)n(1- L(n - 1))2,
or
C(n) =  D(n - 1) - D(n),
 D(n) =
(1/N)n(1- L(n - 1))2,
resulting in an average number of comparison steps
E[C] = Σn > 0 nC(n) = Σn > 0 n(D(n - 1) - D(n)) =
= Σn ≥ 0 D(n) = Σn ≥ 0 (1/N)n(1- L(n - 1))2.
When N ≥ 2, E[C] is dominated by the first few values of C(n); if L(n -1) ≈ 0 for those values (i.e. strings are typically larger) then
 E[C] Σn ≥ 0 (1/N)n = N/(N - 1).
This analysis rests on the assumption that lexicographical comparison is done between independent outcomes of S, but there are scenarios such as binary search where this is not the case at all: we will see in a later entry how this impacts complexity.

Saturday, March 15, 2014

Monads and slots

Monadic lifting and slot-based programming can be combined in pleasant ways. The key idea is: if T is a type with a slot-based interface then a monadic type M<T> can easily replicate T's interface by appropriate lifting; for instance, this is the Maybe monad with slot compatibility:
template<template<typename> class M,typename T>
struct slot_monad
{
  template<typename Slot,typename ...Args>
  auto operator()(Slot&& s,Args&&... args)const
  {
    return mlift<M>(std::forward<Slot>(s))(
      static_cast<const M<T>&>(*this),
      std::forward<Args>(args)...);
  } 

  template<typename Slot,typename ...Args>
  auto operator()(Slot&& s,Args&&... args)
  {
    return mlift<M>(std::forward<Slot>(s))(
      static_cast<M<T>&>(*this),
      std::forward<Args>(args)...);
  } 
};

template<typename T>
struct maybe:boost::optional<T>,slot_monad<maybe,T>
{
  // the rest of the interface, including >>= overloads,
  // remains the same
};
slot_monad<M> provides the required magic. How does it work? Remember that slots are functors, and as such can be functionally transformed, and, in our case, lifted to the monadic realm of M; so, M<T> can replicate T's interface by reacting to any slot s with a corresponding call to mlift<M>(s). M<T> effectively becomes a drop-in replacement of T (with all the added semantics brought in by M).
Let us see this in action. All the examples in this entry are gathered into a program runnable in a C++11 environment with C++14 automatic return type detection (such as GCC 4.8 with -std=c++1y). Boost is also needed.
Dealing with optional objects
vector is the slot-based replica of std::vector we already studied in a previous entry. both vector and  maybe has been equipped with iostream operator<< overloads for testing purposes.
maybe<vector<int>> v=vector<int>{};
for(int i=0;i<100;++i)v(push_back,i);
std::cout<<mlift<maybe>(accumulate())(v(begin),v(end),0)<<"\n";

Output: 4950
maybe<vector> behaves as a regular (slot-based) vector and accepts push_back without problems. Using standard algorithms on this class is, however, a bit more complicated: std::acummulate(v(begin),v(end),0) wouldn't work because v(begin) and v(end) return maybe<vector::iterator>s, which are not iterators —iterator interfaces, based on operator*, operator++, etc., are not easly translatable to slot member functions. So, we have to lift the algorithm, for which we need a polymorphic version of it:
struct accumulate
{
   template<class InputIterator, class T>
   T operator()(
     InputIterator first,InputIterator last,T init)const
   {
     return std::accumulate(first,last,init);
   }
   
  template<class InputIterator,class T,class BinaryOperation>
  T operator()(
    InputIterator first,InputIterator last,T init,
    BinaryOperation binary_op)const
  {
    return std::accumulate(first,last,init,binary_op);
  }  
};
Dealing with Nothing
As guaranteed by the Maybe monad, passing Nothing just works: member functions simply do not do anything and return Nothing.
maybe<vector<int>> v=boost::none;
for(int i=0;i<100;++i)v(push_back,i);
std::cout<<mlift<maybe>(accumulate())(v(begin),v(end),0)<<"\n";

Output: none
Optional member functions
Duality between slot-based classes and slots allows us to pass monadic types also as slots.
maybe<vector<int>> v=vector<int>{};
for(int i=0;i<100;++i)v(maybe<push_back_s>{},i);
std::cout<<mlift<maybe>(accumulate())(v(begin),v(end),0)<<"\n";

Output: 0
Here, we use a maybe<push_back_s> (push_back_s being the type of push_back) which happens to be Nothing, so the vector is not added any element.
Introducing the List monad
This monad maintains a collection of objects of the underlying type; binding to f consists in invoking repeatedly f for each of the objects and gathering the results into another list:
template<typename T>
struct mlist:std::list<T>,slot_monad<mlist,T>
{
  using super=std::list<T>;
  using super::super;
  mlist(const T& t){super::push_back(t);}
  mlist(T&& t){super::emplace_back(std::move(t));}
};

template<typename T,typename F>
auto operator>>=(const mlist<T>& m,F&& f)
{
   decltype(f(std::declval<T>())) ret={};
   for(const auto& t:m)ret.splice(ret.end(),f(t));
   return std::move(ret);
}

template<typename T,typename F>
auto operator>>=(mlist<T>& m,F&& f)
{
   decltype(f(std::declval<T>())) ret={};
   for(auto& t:m)ret.splice(ret.end(),f(t));
   return std::move(ret);
}
mlist also has an iostream operator<< that we don't show here. This monad allows us then to carry out simultaneous operations on a collection of objects:
mlist<vector<int>> ml={{0,1},{1,2}};           // #1
std::cout<<ml(size)<<"\n";
ml(push_back,1);                               // #2
std::cout<<ml<<"\n";
ml([](vector<int>& v){v(push_back,v(at,0));}); // #3
std::cout<<ml<<"\n";

Output:
{2,2}
{{0,1},{1,2}}
{{0,1,1},{1,2,1}}
In the use case #1, the results of size (std::size_ts) are compiled into an mlist<std::size_t> of their own. Use case #3 is interesting: much as we can pass any slot to mlist<vector<int>>, we can in fact pass any functor accepting vector<int> as its first parameter, such as the lambda function shown in the example. This property is exhibited by any open-ended slot-based class, that is, a class C that accepts any functor F as a permissible slot and invokes it for some underlying object O(C):
C(S,...) S(O(C),...).
(As an effect of the duality pervading slot-based programming, plain slots such as push_back are in their turn open-ended classes.)
mlist and maybe
Different monadic types can be used jointly without surprises, as each contributes its own lifting. In the following example we deal with a list of potentially non-existent vector<int>s.
mlist<maybe<vector<int>>> ml=
  {vector<int>{0,1},vector<int>{2,3},boost::none};    // #1
std::cout<<ml(size)<<"\n";
ml(push_back,1);                                      // #2
std::cout<<ml<<"\n";
ml([](maybe<vector<int>>& v){v(push_back,v(at,0));}); // #3
std::cout<<ml<<"\n";
ml(push_back,mlist<int>{5,6,7,8});                    // #4
std::cout<<ml<<"\n";

Output:
{2,2,none}
{{0,1,1},{2,3,1},none}
{{0,1,1,0},{2,3,1,2},none}
{{0,1,1,0,5,6,7,8},{2,3,1,2,5,6,7,8},none}
Use case #4 shows how lifted member functions accept monadic types in any argument position, not only for the first one: pushing back an mlist<int> is equivalent to pushing back each of its elements one by one.
Lists of commands
Remember that slots, by their being esentially functors, can be used with global function syntax (i.e. push_back(v) rather than v(push_back)). Combining this property with mlist allows us to work with collections of commands on collections of objects:
using namespace std::placeholders;
mlist<maybe<vector<int>>> ml=
  {vector<int>{0,1},vector<int>{2,3},boost::none};
mlist<std::function<maybe<mvoid>(maybe<vector<int>>&)>>
commands=
{
  pop_back,
  std::bind(push_back,_1,0)
};
commands(ml);
std::cout<<ml<<"\n";

Output: {{0,0},{2,0},none}
The example also shows how slots can be bound much as any other regular functor.
Lifted functions on non-monadic objects
vector<int> v;
mlift<mlist>(push_back)(v,mlist<int>{0,1,2,3,4});
std::cout<<v<<"\n";

Output: {0,1,2,3,4}
vector<int> knows nothing about mlist, so push_back(v,mlist<int>{0,1,2,3,4}) woud not work. The problem is easily solved by lifting push_back.
Conclusions
Slot-based programming makes it immediate for monadic types to replicate the interfaces of their underlying types via lifting, which can be exploited to use a monad M<T> as a syntactically compatible replacement of T, with only small adjustments needed in some cases. Slot-class duality and joint use of different monadic realms combine in surprisingly flexible ways that allow for interesting behavioral transformations of non-monadic programs.

Sunday, March 9, 2014

Monadic lifting in C++

Monadic lifting is the process of transforming a (curryed) n-ary function
f : T1 T2 ··· Tn R
into a function liftM(f) with domain and range in the type system associated to a monad M:
liftM(f) : MT1 MT2 ··· MTn MR
with the following behavior:
liftM(f)(m1, ... , mn) :=
m1 >>= λt1.(m2 >>= λt2.( ··· (mn >>= λtn.returnM(f(t1, ... , tn)))···).
(We are using lambda notation here.) This looks more complex than it really is: lifting is basically translating a function into the realm of a monad by using the only mechanisms monads allow us to process their values, namely creating a monadic object from a non-monadic value (returnM) and injecting a monadic object into a non-monadic to monadic function f : T MR via binding (denoted by >>=). The resulting scheme complies with a key design aspect of monads: we are not supposed to unwrap and rewrap them at will, but this is something only the monadic code can do within the implementation of >>=. 
So, let us implement monadic lifting in C++. The following is a minimal monadic framewok plus an implementation of the famous Maybe monad. For brevity of exposition, C++14 automatic return type deduction is used.
struct mvoid{};

template<template<typename> class M,class T>
M<T> mreturn(T&& t)
{
  return std::forward<T>(t);
}

// maybe built on top of boost::optional
// maybe<T>(t) is Just t, boost::none acts as Nothing

template<typename T>
struct maybe:boost::optional<T>
{
  using super=boost::optional<T>;
  using super::super;
};

template<typename T,typename F>
auto operator>>=(const maybe<T>& m,F&& f)
{
  return m?f(m.get()):boost::none;
}

template<typename T,typename F>
auto operator>>=(maybe<T>& m,F&& f)
{
  return m?f(m.get()):boost::none;
}
Within this framework, a monad is a class template M mapping underlying types T to their associated monadic types M<T>, plus corresponding overloads of operator>>= (non-const monadic objects, which are alien to functional programming, are accepted here). returnM is provided by the framework (i.e. it need not be written by the monad implementer) on the assumption that M<T> can be constructed from T. The other C++-specific addition is mvoid: while in functional programming all functions return something, C++ functions can return void, which is not a proper type with associated values: we will accommodate this difficulty by lifting void-returning functions into monadic functions returning M<mvoid>s.
The task at hand is then implementing
template<template<typename> class M,typename F>
mlifted<M,F> mlift(F&& f);
with the following properties:
  •  If F is callable with arguments T1, ... , Tn, then mlifted<M,F> is callable with arguments M1, ... , Mn, where each Mi is either Ti or M<Ti>, and the return type is M<std::result_of(F(T1,...,Tn))> (or M<mvoid> if F does not return a value).
  • Constness is propagated from Ti to Mi. All arguments are accepted by lvalue reference: rvalue references and move semantics do not play well with monadic code where the underlying functions can be invoked internally more than once (for instance, with collection monads).
  • The behavior of mlifted<M,F> follows the formal definition of liftM, with the additional provisions that 1) non-monadic values are passed directly to the invocation of F and 2) if F returns void then it is executed internally as a side effect previously to returning M<mvoid>.
At first blush it could seem like this can be directly implemented by translating the definition liftM to C++ via lambda functions and std::bind, but in fact it cannot, because std::bind is not curry-compatible,
int foo(int x,int y);
...
auto f=std::bind(foo,5);
f(6); // error: no match for call to std::bind<...>
that is, if the underlying function is n-ary, specifying m binding arguments does not produce a (m-n)-ary callable entity, which greatly complicate things for our task. So, we have to implement mlifted on our own. This is the general outlook of our solution:
template<template<typename> class M,typename F>
struct mlifted
{
  mlifted(const F& f):f(f){}
  mlifted(F&& f):f(std::move(f)){}
  
  template<typename ...Args>
  auto operator()(Args&&... args)
  {
    mlifted_call_context<M,F,Args...> context(f,args...);
    return context();
  }
  
private:
  F f;
};

template<template<typename> class M,typename F,typename ...Args>
struct mlifted_call_context
{
  static const std::size_t num_args=sizeof...(Args);
  
  mlifted_call_context(F& f,Args&... args):
    f(f),args(std::make_tuple(&args...)){}
  
  auto operator()()
  {
    mlifted_call_stage<M,mlifted_call_context,num_args>
     stage(*this);
    return stage();
  }
  
  F& f;
  std::tuple<
    typename std::remove_reference<Args>::type*...>
    args;
  std::tuple<
    mwrapped_t<M,typename std::remove_reference<Args>::type>*...>
    postargs;
};
mlifted::operator(M1,...,Mn) creates a call context with two sets of arguments:
  • A tuple of pointers to M1, ..., Mn.
  • A tuple of pointers to the postprocesed args T1, ..., Tn, where each Ti is either Mi in the case of non-monadic values or else the value generated by Mi during its binding (determining this underlying type is accomplished with mwrapped_t, whose simple implementation is not described).
Execution resolves then to a recursive call to
mlifted_call_stage<...,n> ... mlifted_call_stage<...,0>,
each taking care of the (n-i)-th argument until the final invocation of F once all Tis are resolved:
template<
  template<typename> class M,typename Context,std::size_t N
>
struct mlifted_call_stage
{
  static const std::size_t I=Context::num_args-N;
  
  mlifted_call_stage(Context& c):c(c){}
  
  auto operator()(){return operator()(*std::get<I>(c.args));}
  
  template<typename T> auto operator()(T&& t)
  {
     std::get<I>(c.postargs)=&t;
     mlifted_call_stage<M,Context,N-1> next(c);
     return next();
  }
  template<typename T> auto operator()(const M<T>& m)
  {
    return m>>=*this;
  }
  template<typename T> auto operator()(M<T>& m)
  {
    return m>>=*this;
  }

  Context& c;
};

template<template<typename> class M,typename Context>
struct mlifted_call_stage<M,Context,0>
{  
  mlifted_call_stage(Context& c):c(c){}
  
  auto operator()(){return apply_pointer_tuple(c.f,c.postargs);}
};
The actual code is a little more complicated so as to deal with the mvoid issue, and some portions, like the implementation of apply_pointer_tuple, have been omitted here, but the general idea is rather simple and there is not too much execution overhead involved (ideally, a compiler could strip most of the scaffolding away): in particular, no dynamic memory allocation is used. The entire implementation is available online; a C++11 compiler with C++14 automatic return type detection (usually via the -std=c++1y compiler option) is needed to run it.
So equipped, we can play a bit with mlift (more examples are given in the program):
int foo(int x,int y){return x*y;}
...
auto mfoo=mlift<maybe>(&foo);

mfoo(maybe<int>(2),maybe<int>(3));  // maybe<int>(6)
mfoo(2,maybe<int>(3));              // maybe<int>(6)
mfoo(
  maybe<int>(4),
  mfoo(2,maybe<int>(3)));           // maybe<int>(24)
mfoo(maybe<int>(boost::none),3);    // maybe<int>(boost::none)
mfoo(
  maybe<int>(4),
  mfoo(maybe<int>(boost::none),3)); // maybe<int>(boost::none)
This snippet is admittedly trivial, but the framework is entirely generic and works with monads and functions of arbitrary complexity. In a later entry we will explore the possibilites of monadic lifting within slot-based progamming.