Monday, June 29, 2015

Design patterns for invariant suspension

One of the most important aspects of data encapsulation as embodied by classes and similar constructs is the ability to maintain class invariants by restricting access to the implementation data through a well defined interface. (Incidentally, a class without significant invariants is little more than a tuple of values, which is why the presence of getters and setters is a sure sign of poor design). Sometimes, however, maintaining the class invariant can lead to unacceptable costs in terms of complexity or performance. Consider two examples:
  • A sorted_vector class that maintains an internal buffer of ordered elements.
  • A spread_sheet class that implements automatic formula calculation.
These classes restrict the potential states of their implementation data so that they meet some extra requirements the interface relies on (for instance, the fact that the elements of sorted_vector are ordered allows for an lookup operations, automatic values in the spread sheet are consistent with input data cells, etc.). State can only then be changed by using the provided interface, which takes proper care that the invariant is kept upon exit from any method —and this is all good and desireable.
The problem arises when we need to do bulk modifications on the internal data and the class interface is too rigid to support this:
  • sorted_vector typically only allows for deletion of elements or insertion of new values, and does not provide mechanisms for directly modifying the elements or temporarily rearranging them.
  • spread_sheet might only accept changing one data cell at a time so as to check which formulas are affected and recalculate those alone.
If we granted unrestricted access to the data, the class would have to renormalize (i.e. restore the invariant) inside every method. Let us see this for the case of sorted_vector:
class sorted_vector
{
  value_type* data(); // unrestricted acccess to data
  iterator find(const key_type& x)const
  {
    sort();
    ...
  }
  ...
private:
  void sort()mutable;
};
This is obviously impracticable for efficiency reasons. Another alternative, sometimes found in questionable designs, is to put the burden of maintaining the invariant on the user herself:
class sorted_vector
{
  value_type* data();
  void sort();
  iterator find(const key_type& x)const // must be sorted
  ...
}
The problem with this approach is that it is all too easy to forget to restore the invariant. What is worse, there is no way that the class can detect the overlook and raise some signal (an error code or an exception). A poor attempt at solving this:
class sorted_vector
{
  value_type* data();
  void modifying_data() // to indicate invariant breach
  {
    sorted=false;
  }
  void sort();
  iterator find(const key_type& x)const
  {
    if(!sorted)throw std::runtime_exception("not sorted");
    ...
  }
  ...
private:
  bool sorted;
  ...
}
is just as brittle as the previous situation and more cumbersome for the user. If we finally renounce to giving access to the internal data, the only way to do bulk modifications of the internal state is constructing the data externally and then reassigning:
sorted_vector v;
vector aux;
... // manipulate data in aux
v=sorted_vector(aux.begin(),aux.end())
This last option is not entirely bad, but incurs serious inefficiencies related to the external allocation of the auxiliary data and the recreation of a new sorted_vector object.
We look for design patterns that enable invariant suspension (i.e. unrestricted access to a class internal data) in a controlled and maximally efficient way.
Internal invariant suspension
In this pattern, we allow for suspension only within the execution of a class method:
class X
{
  template<typename F>
  void modify_data(F f); // f is invoked with X internal data
};
How does this work? The idea is that f is a user-provided piece of code that is given access to the internal data to manipulate freely. After f is done, modify_data, which issued the invocation, can renormalize the data before exiting. Let us see this in action for sorted_vector:
class sorted_vector
{
  template<typename F>
  void modify_data(F f)
  {
    f(data);
    sort(); // restore invariant
  }
};
...
sorted_vector v;
...
v.modify_data([](value_type* data){
  ... // manipulate data as wee see fit
});
This pattern is most suitable when modifications are relatively localized in code. When the modification takes place in a more uncontrolled setting, we need something even more flexible.
External invariant suspension
Here, we move the data out and accept it back again when the user is done with it.
class X
{
  data extract_data(); // X becomes "empty"
  void accept_data(data& d); // X should be "empty"
};
At first sight, this seems just as bad as our first example with sorted_vector::data(): as soon as we grant access to the internal state, we can no longer be sure the invariant is kept. The key aspect of the pattern, however, is that extract_data sets X to some empty state, i.e., a state without internal data where the invariant is trivially true in all cases —in other words, X gives its internal data away.
class sorted_vector
{
  value_type* extract_data()
  {
    value_type res=data;
    data=nullptr;
    return res;
  }
  void accept_data(value_type*& d)
  {
    if(data)throw std::runtime("not empty");
    // alternatively: delete data
    
    data=d;
    d=nullptr;
    sort(); // restore invariant
  }
};
...
sorted_vector v;
...
value_type* data=v.extract_data()
... // manipulate data
v.accept_data(data);
How does this meet our requirements of control and efficiency?
  • Control: the invariant of v is always true regardless of the user code: when invoking extract_data, v becomes empty, i.e. a vector with zero elements which is of course sorted. The methods of sorted_vector need not check for special cases of invariant suspension.
  • Efficiency: the data is never re-created, but merely moved around; no extra allocations are required.
Conclusions
Class invariants are a key aspect of data encapsulation, but can lead to inefficient code when the class interface is too restrictive for data manipulation, We have described two general design patterns, internal invariant suspension and external invariant suspension, that allow for internal state modification in a controlled fashion without interfering with the class general interface or requiring any type of data copying.

1 comment: