Bannalia: trivial notes on themes diverse: March 2014

Monday, March 31, 2014

Already/still

"Already" refers to some condition C that has begun in the past and is still in effect, in practical terms or with regard to its impact on the speaker:

"Still", on the other hand, is relative to a condition currently ongoing but bound to finish at some point in the future:

So, it would be odd to qualify a given event with both adverbs in the same sentence,

because each would claim the statement focus: one is hard-pressed to find real-life examples of this configuration.

This evening I went out for dinner at around 9:00PM or so. As daylight saving time has just kicked in in Spain, yesterday was the first time in the year that the setting sun can still be seen hiding behind rooftops at such a late hour. I exclaimed:

¡Ya es todavía de día!

which could be roughly translated to:

There's already still daylight!

So there you have it, "already" and "still" in the same sentence. In reality, the temporal structure of the sentence is not like depicted above, but rather the following:

C is the moment daylight time saving goes into effect, and C' the situation where the sun is out, that is, between dawn and dusk.The funny thing about the statement is that "already" refers to C and "still" to C', a sort of multi-scale sentence. A clearer paraphrasing reveals this:

The day has already come when there's still daylight this late!

I very clearly remember one early spring evening when I was around nine years old: those days I was taking extra-class music lessons and came back home late, in total darkness, but that very Monday when I approached the block the sun was still pouring an afterglow of dusty golden light over the horizon line. I think this was the first time I realized days shrink and grow during the year; I guess that Monday was the first one in daylight saving time (which I knew nothing about back then) and shifting had just happened the day before, so making the effect of ever expanding sunlight much more noticeable.

Friday, March 28, 2014

Complexity of lexicographical comparison

Given two strings s, s', determining whether s < s' under a lexicographical order takes a number of steps 1 ≤ C ≤ 1 + min{len(s),len(s')}, but the actual statistical distribution of C depends of course on those of the strings being compared: if they tend to share the same prefix, for instance, C will be typically larger than in the case where the strings are more randomly generated.

Let S be a probability distribution over A*, the set of strings composable with symbols from a finite alphabet A with N symbols. A convenient way to describe S is by way of Markov chains:

Strings are generated from the empty prefix Λ by the addition of new symbols from A, termination being conventionally marked by transitioning to some symbol 0 not in A. Each arrow has an associated transition probability T(p,q), p ∈ A*, q ∈ A ∪ {0}. The probability that a string has prefix p = p₁p₂...p_n is then

P(p) = P(Λ)T(Λ, p₁)T(p₁, p₂)T(p₁p₂, p₃) ··· T(p₁p₂···p_n-1, p_n),

with P(Λ) = 1. Now, comparing two strings independently extracted from A* is equivalent to analyzing their transition paths until the first divergence is found. If we denote by C(n) the probability that such comparison takes exactly n steps it is easy to see that

C(n) =Σ_p_{∈ A^n-1} P²(p)(1 - Σ_q_{∈ A} T²(p,q)),

which corresponds to the probability that the strings coincide up to the (n-1)-th symbol and then either differ on the n-th one or both terminate (that is, they don't transition to the same non-terminating symbol).

A particularly simple family of distributions for S are those where the lenghts of strings are governed by a random variable L with cumulative distribution function L(n) = Pr(L ≤ n) and non-terminating symbols occur equally likely, that is, for p ∈ Aⁿ, q ∈ A we have:

T(p,0) = (L(n) - L(n - 1))/(1- L(n - 1)),
T(p,q) = (1 - T(p,0))/N = (1/N)(1 - L(n))/(1- L(n - 1)),
P(p) = Π_{i = 0,...,n-1} (1/N)(1 - L(i))/(1- L(i - 1)) = (1/N)ⁿ (1 - L(n - 1)),

which gives

C(n) = N^n-1·(1/N)^2(n-1)(1- L(n - 2))²(1 - N·(1/N)²(1 - L(n - 1))²/(1- L(n - 2))²) =
= (1/N)^n-1(1- L(n - 2))² - (1/N)ⁿ(1- L(n - 1))²,

C(n) = D(n - 1) - D(n),
D(n) = (1/N)ⁿ(1- L(n - 1))²,

resulting in an average number of comparison steps

E[C] = Σ_{n > 0} nC(n) = Σ_{n > 0} n(D(n - 1) - D(n)) =
= Σ_{n ≥ 0} D(n) = Σ_{n ≥ 0} (1/N)ⁿ(1- L(n - 1))².

When N ≥ 2, E[C] is dominated by the first few values of C(n); if L(n -1) ≈ 0 for those values (i.e. strings are typically larger) then

E[C] ≈ Σ_n_{≥ 0} (1/N)ⁿ = N/(N - 1).

This analysis rests on the assumption that lexicographical comparison is done between independent outcomes of S, but there are scenarios such as binary search where this is not the case at all: we will see in a later entry how this impacts complexity.

Saturday, March 15, 2014

Monads and slots

Monadic lifting and slot-based programming can be combined in pleasant ways. The key idea is: if T is a type with a slot-based interface then a monadic type M<T> can easily replicate T's interface by appropriate lifting; for instance, this is the Maybe monad with slot compatibility:

template<template<typename> class M,typename T>
struct slot_monad
{
  template<typename Slot,typename ...Args>
  auto operator()(Slot&& s,Args&&... args)const
  {
    return mlift<M>(std::forward<Slot>(s))(
      static_cast<const M<T>&>(*this),
      std::forward<Args>(args)...);
  } 

  template<typename Slot,typename ...Args>
  auto operator()(Slot&& s,Args&&... args)
  {
    return mlift<M>(std::forward<Slot>(s))(
      static_cast<M<T>&>(*this),
      std::forward<Args>(args)...);
  } 
};

template<typename T>
struct maybe:boost::optional<T>,slot_monad<maybe,T>
{
  // the rest of the interface, including >>= overloads,
  // remains the same
};

slot_monad<M> provides the required magic. How does it work? Remember that slots are functors, and as such can be functionally transformed, and, in our case, lifted to the monadic realm of M; so, M<T> can replicate T's interface by reacting to any slot s with a corresponding call to mlift<M>(s). M<T> effectively becomes a drop-in replacement of T (with all the added semantics brought in by M).

Let us see this in action. All the examples in this entry are gathered into a program runnable in a C++11 environment with C++14 automatic return type detection (such as GCC 4.8 with -std=c++1y). Boost is also needed.

Dealing with optional objects

vector is the slot-based replica of std::vector we already studied in a previous entry. both vector and maybe has been equipped with iostream operator<< overloads for testing purposes.

maybe<vector<int>> v=vector<int>{};
for(int i=0;i<100;++i)v(push_back,i);
std::cout<<mlift<maybe>(accumulate())(v(begin),v(end),0)<<"\n";

Output: 4950

maybe<vector> behaves as a regular (slot-based) vector and accepts push_back without problems. Using standard algorithms on this class is, however, a bit more complicated: std::acummulate(v(begin),v(end),0) wouldn't work because v(begin) and v(end) return maybe<vector::iterator>s, which are not iterators —iterator interfaces, based on operator*, operator++, etc., are not easly translatable to slot member functions. So, we have to lift the algorithm, for which we need a polymorphic version of it:

struct accumulate
{
   template<class InputIterator, class T>
   T operator()(
     InputIterator first,InputIterator last,T init)const
   {
     return std::accumulate(first,last,init);
   }
   
  template<class InputIterator,class T,class BinaryOperation>
  T operator()(
    InputIterator first,InputIterator last,T init,
    BinaryOperation binary_op)const
  {
    return std::accumulate(first,last,init,binary_op);
  }  
};

Dealing with Nothing

As guaranteed by the Maybe monad, passing Nothing just works: member functions simply do not do anything and return Nothing.

maybe<vector<int>> v=boost::none;
for(int i=0;i<100;++i)v(push_back,i);
std::cout<<mlift<maybe>(accumulate())(v(begin),v(end),0)<<"\n";

Output: none

Optional member functions

Duality between slot-based classes and slots allows us to pass monadic types also as slots.

maybe<vector<int>> v=vector<int>{};
for(int i=0;i<100;++i)v(maybe<push_back_s>{},i);
std::cout<<mlift<maybe>(accumulate())(v(begin),v(end),0)<<"\n";

Output: 0

Here, we use a maybe<push_back_s> (push_back_s being the type of push_back) which happens to be Nothing, so the vector is not added any element.

Introducing the List monad

This monad maintains a collection of objects of the underlying type; binding to f consists in invoking repeatedly f for each of the objects and gathering the results into another list:

template<typename T>
struct mlist:std::list<T>,slot_monad<mlist,T>
{
  using super=std::list<T>;
  using super::super;
  mlist(const T& t){super::push_back(t);}
  mlist(T&& t){super::emplace_back(std::move(t));}
};

template<typename T,typename F>
auto operator>>=(const mlist<T>& m,F&& f)
{
   decltype(f(std::declval<T>())) ret={};
   for(const auto& t:m)ret.splice(ret.end(),f(t));
   return std::move(ret);
}

template<typename T,typename F>
auto operator>>=(mlist<T>& m,F&& f)
{
   decltype(f(std::declval<T>())) ret={};
   for(auto& t:m)ret.splice(ret.end(),f(t));
   return std::move(ret);
}

mlist also has an iostream operator<< that we don't show here. This monad allows us then to carry out simultaneous operations on a collection of objects:

mlist<vector<int>> ml={{0,1},{1,2}};           // #1
std::cout<<ml(size)<<"\n";
ml(push_back,1);                               // #2
std::cout<<ml<<"\n";
ml([](vector<int>& v){v(push_back,v(at,0));}); // #3
std::cout<<ml<<"\n";

Output:
{2,2}
{{0,1},{1,2}}
{{0,1,1},{1,2,1}}

In the use case #1, the results of size (std::size_ts) are compiled into an mlist<std::size_t> of their own. Use case #3 is interesting: much as we can pass any slot to mlist<vector<int>>, we can in fact pass any functor accepting vector<int> as its first parameter, such as the lambda function shown in the example. This property is exhibited by any open-ended slot-based class, that is, a class C that accepts any functor F as a permissible slot and invokes it for some underlying object O(C):

C(S,...) → S(O(C),...).

(As an effect of the duality pervading slot-based programming, plain slots such as push_back are in their turn open-ended classes.)

mlist and maybe

Different monadic types can be used jointly without surprises, as each contributes its own lifting. In the following example we deal with a list of potentially non-existent vector<int>s.

mlist<maybe<vector<int>>> ml=
  {vector<int>{0,1},vector<int>{2,3},boost::none};    // #1
std::cout<<ml(size)<<"\n";
ml(push_back,1);                                      // #2
std::cout<<ml<<"\n";
ml([](maybe<vector<int>>& v){v(push_back,v(at,0));}); // #3
std::cout<<ml<<"\n";
ml(push_back,mlist<int>{5,6,7,8});                    // #4
std::cout<<ml<<"\n";

Output:
{2,2,none}
{{0,1,1},{2,3,1},none}
{{0,1,1,0},{2,3,1,2},none}
{{0,1,1,0,5,6,7,8},{2,3,1,2,5,6,7,8},none}

Use case #4 shows how lifted member functions accept monadic types in any argument position, not only for the first one: pushing back an mlist<int> is equivalent to pushing back each of its elements one by one.

Lists of commands

Remember that slots, by their being esentially functors, can be used with global function syntax (i.e. push_back(v) rather than v(push_back)). Combining this property with mlist allows us to work with collections of commands on collections of objects:

using namespace std::placeholders;
mlist<maybe<vector<int>>> ml=
  {vector<int>{0,1},vector<int>{2,3},boost::none};
mlist<std::function<maybe<mvoid>(maybe<vector<int>>&)>>
commands=
{
  pop_back,
  std::bind(push_back,_1,0)
};
commands(ml);
std::cout<<ml<<"\n";

Output: {{0,0},{2,0},none}

The example also shows how slots can be bound much as any other regular functor.

Lifted functions on non-monadic objects

vector<int> v;
mlift<mlist>(push_back)(v,mlist<int>{0,1,2,3,4});
std::cout<<v<<"\n";

Output: {0,1,2,3,4}

vector<int> knows nothing about mlist, so push_back(v,mlist<int>{0,1,2,3,4}) woud not work. The problem is easily solved by lifting push_back.

Conclusions

Slot-based programming makes it immediate for monadic types to replicate the interfaces of their underlying types via lifting, which can be exploited to use a monad M<T> as a syntactically compatible replacement of T, with only small adjustments needed in some cases. Slot-class duality and joint use of different monadic realms combine in surprisingly flexible ways that allow for interesting behavioral transformations of non-monadic programs.

Sunday, March 9, 2014

Monadic lifting in C++

Monadic lifting is the process of transforming a (curryed) n-ary function

f : T₁ → T₂ → ··· → T_n → R

into a function lift_M(f) with domain and range in the type system associated to a monad M:

lift_M(f) : MT₁ → MT₂ → ··· → MT_n → MR

with the following behavior:

lift_M(f)(m₁, ... , m_n) :=
m₁ >>= λt₁.(m₂ >>= λt₂.( ··· (m_n >>= λt_n.return_M(f(t₁, ... , t_n)))···).

(We are using lambda notation here.) This looks more complex than it really is: lifting is basically translating a function into the realm of a monad by using the only mechanisms monads allow us to process their values, namely creating a monadic object from a non-monadic value (return_M) and injecting a monadic object into a non-monadic to monadic function f : T → MR via binding (denoted by >>=). The resulting scheme complies with a key design aspect of monads: we are not supposed to unwrap and rewrap them at will, but this is something only the monadic code can do within the implementation of >>=.

So, let us implement monadic lifting in C++. The following is a minimal monadic framewok plus an implementation of the famous Maybe monad. For brevity of exposition, C++14 automatic return type deduction is used.

struct mvoid{};

template<template<typename> class M,class T>
M<T> mreturn(T&& t)
{
  return std::forward<T>(t);
}

// maybe built on top of boost::optional
// maybe<T>(t) is Just t, boost::none acts as Nothing

template<typename T>
struct maybe:boost::optional<T>
{
  using super=boost::optional<T>;
  using super::super;
};

template<typename T,typename F>
auto operator>>=(const maybe<T>& m,F&& f)
{
  return m?f(m.get()):boost::none;
}

template<typename T,typename F>
auto operator>>=(maybe<T>& m,F&& f)
{
  return m?f(m.get()):boost::none;
}

Within this framework, a monad is a class template M mapping underlying types T to their associated monadic types M<T>, plus corresponding overloads of operator>>= (non-const monadic objects, which are alien to functional programming, are accepted here). return_M is provided by the framework (i.e. it need not be written by the monad implementer) on the assumption that M<T> can be constructed from T. The other C++-specific addition is mvoid: while in functional programming all functions return something, C++ functions can return void, which is not a proper type with associated values: we will accommodate this difficulty by lifting void-returning functions into monadic functions returning M<mvoid>s.

The task at hand is then implementing

template<template<typename> class M,typename F>
mlifted<M,F> mlift(F&& f);

with the following properties:

If F is callable with arguments T1, ... , Tn, then mlifted<M,F> is callable with arguments M1, ... , Mn, where each Mi is either Ti or M<Ti>, and the return type is M<std::result_of(F(T1,...,Tn))> (or M<mvoid> if F does not return a value).
Constness is propagated from Ti to Mi. All arguments are accepted by lvalue reference: rvalue references and move semantics do not play well with monadic code where the underlying functions can be invoked internally more than once (for instance, with collection monads).
The behavior of mlifted<M,F> follows the formal definition of lift_M, with the additional provisions that 1) non-monadic values are passed directly to the invocation of F and 2) if F returns void then it is executed internally as a side effect previously to returning M<mvoid>.

At first blush it could seem like this can be directly implemented by translating the definition lift_M to C++ via lambda functions and std::bind, but in fact it cannot, because std::bind is not curry-compatible,

int foo(int x,int y);
...
auto f=std::bind(foo,5);
f(6); // error: no match for call to std::bind<...>

that is, if the underlying function is n-ary, specifying m binding arguments does not produce a (m-n)-ary callable entity, which greatly complicate things for our task. So, we have to implement mlifted on our own. This is the general outlook of our solution:

template<template<typename> class M,typename F>
struct mlifted
{
  mlifted(const F& f):f(f){}
  mlifted(F&& f):f(std::move(f)){}
  
  template<typename ...Args>
  auto operator()(Args&&... args)
  {
    mlifted_call_context<M,F,Args...> context(f,args...);
    return context();
  }
  
private:
  F f;
};

template<template<typename> class M,typename F,typename ...Args>
struct mlifted_call_context
{
  static const std::size_t num_args=sizeof...(Args);
  
  mlifted_call_context(F& f,Args&... args):
    f(f),args(std::make_tuple(&args...)){}
  
  auto operator()()
  {
    mlifted_call_stage<M,mlifted_call_context,num_args>
     stage(*this);
    return stage();
  }
  
  F& f;
  std::tuple<
    typename std::remove_reference<Args>::type*...>
    args;
  std::tuple<
    mwrapped_t<M,typename std::remove_reference<Args>::type>*...>
    postargs;
};

mlifted::operator(M1,...,Mn) creates a call context with two sets of arguments:

A tuple of pointers to M1, ..., Mn.
A tuple of pointers to the postprocesed args T1, ..., Tn, where each Ti is either Mi in the case of non-monadic values or else the value generated by Mi during its binding (determining this underlying type is accomplished with mwrapped_t, whose simple implementation is not described).

Execution resolves then to a recursive call to

mlifted_call_stage<...,n> → ... → mlifted_call_stage<...,0>,

each taking care of the (n-i)-th argument until the final invocation of F once all Tis are resolved:

template<
  template<typename> class M,typename Context,std::size_t N
>
struct mlifted_call_stage
{
  static const std::size_t I=Context::num_args-N;
  
  mlifted_call_stage(Context& c):c(c){}
  
  auto operator()(){return operator()(*std::get<I>(c.args));}
  
  template<typename T> auto operator()(T&& t)
  {
     std::get<I>(c.postargs)=&t;
     mlifted_call_stage<M,Context,N-1> next(c);
     return next();
  }
  template<typename T> auto operator()(const M<T>& m)
  {
    return m>>=*this;
  }
  template<typename T> auto operator()(M<T>& m)
  {
    return m>>=*this;
  }

  Context& c;
};

template<template<typename> class M,typename Context>
struct mlifted_call_stage<M,Context,0>
{  
  mlifted_call_stage(Context& c):c(c){}
  
  auto operator()(){return apply_pointer_tuple(c.f,c.postargs);}
};

The actual code is a little more complicated so as to deal with the mvoid issue, and some portions, like the implementation of apply_pointer_tuple, have been omitted here, but the general idea is rather simple and there is not too much execution overhead involved (ideally, a compiler could strip most of the scaffolding away): in particular, no dynamic memory allocation is used. The entire implementation is available online; a C++11 compiler with C++14 automatic return type detection (usually via the -std=c++1y compiler option) is needed to run it.

So equipped, we can play a bit with mlift (more examples are given in the program):

int foo(int x,int y){return x*y;}
...
auto mfoo=mlift<maybe>(&foo);

mfoo(maybe<int>(2),maybe<int>(3));  // maybe<int>(6)
mfoo(2,maybe<int>(3));              // maybe<int>(6)
mfoo(
  maybe<int>(4),
  mfoo(2,maybe<int>(3)));           // maybe<int>(24)
mfoo(maybe<int>(boost::none),3);    // maybe<int>(boost::none)
mfoo(
  maybe<int>(4),
  mfoo(maybe<int>(boost::none),3)); // maybe<int>(boost::none)

This snippet is admittedly trivial, but the framework is entirely generic and works with monads and functions of arbitrary complexity. In a later entry we will explore the possibilites of monadic lifting within slot-based progamming.

Subscribe to: Posts ( Atom )