Monday, August 31, 2015

C++ encapsulation for Data-Oriented Design

Data-Oriented Design, or DOD for short, seeks to maximize efficiency by laying out data in such a way that their processing is as streamlined as possible. This is often against the usual object-based principles that naturally lead to grouping the information accordingly to the user-domain entities that it models. Consider for instance a game where large quantities of particles are rendered and moved around:
class particle
{  
  char  color;
  int   x;
  int   y;
  int   dx;
  int   dy;
public:

  static const int max_x=200;
  static const int max_y=100;
    
  particle(char color_,int x_,int y_,int dx_,int dy_):
    color(color_),x(x_),y(y_),dx(dx_),dy(dy_)
  {}

  void render()const
  {
    // for explanatory purposes only: dump to std::cout
    std::cout<<"["<<x<<","<<y<<","<<int(color)<<"]\n";
  }

  void move()
  {
    x+=dx;
    if(x<0){
      x*=-1;
      dx*=-1;
    }
    else if(x>max_x){
      x=2*max_x-x;
      dx*=-1;      
    }
    
    y+=dy;
    if(y<0){
      y*=-1;
      dx*=-1;
    }
    else if(y>max_y){
      y=2*max_y-y;
      dy*=-1;      
    }
  }
};
...
// game particles
std::vector<particle> particles;
In the rendering loop, the program might do:
for(const auto& p:particles)p.render();
Trivial as it seems, the execution speed of this approach is nevertheless suboptimal. The memory layout for particles looks like:
which, when traversed in the rendering loop, results in 47% of the data cached by the CPU (the part corresponding to dx and dy, in white) not being used, or even more if padding occurs. A more intelligent layout based on 5 different vectors
allows the needed data, and only this, to be cached in three parallel cache lines, thus maximizing occupancy and minimizing misses. For the moving loop, it is a different set of data vectors that must be provided. DOD is increasingly popular, in particular in very demanding areas such as game programming. Mike Acton's presentation on DOD and C++ is an excellent introduction to the principles of data orientation.
The problem with DOD is that encapsulation is lost: rather than being nicely packed in contiguous chunks of memory whose lifetime management is heavily supported by the language rules, "objects" now live as virtual entities with disemboweled, scattered pieces of information floating around in separate data structures. Methods acting on the data need to publish the exact information they require as part of their interface, and it is the responsibility of the user to locate it and provide it. We want to explore ways to remedy this situation by allowing a modest level of object encapsulation compatible with DOD.
Roughly speaking, in C++ an object serves two different purposes:
  • Providing a public interface (a set of member functions) acting on the associated data.
  • Keeping access to the data and managing its lifetime.
Both roles are mediated by the this pointer. In fact, executing a member function on an object
x.f(args...);
is conceptually equivalent to invoking a function with an implicit extra argument
X::f(this,args...);
where the data associated to x, assumed to be contiguous, is pointed to by this. We can break this intermediation by letting objects be supplied with an access entity replacing this for the purpose of reaching out to the information. We begin with a purely syntactic device:
template<typename T,int Tag=0>
struct member
{
  using type=T;
  static const int tag=Tag;
};
member<T,Tag> will be used to specify that a given class has a piece of information with type T. Tag is needed to tell apart different members of the same type (for instance, particle has four different members of type int, namely x, y, dx and dy). Now, the following class:
template<typename Member>
class access
{
  using type=typename Member::type;
  type* p;

public:
  access(type* p):p(p){}
  
  type&       get(Member){return *p;}
  const type& get(Member)const{return *p;}
};
stores a pointer to a piece of data accessing the specified member. This can be easily expanded to accommodate for more than one member:
template<typename... Members>class access;

template<typename Member>
class access<Member>
{
  using type=typename Member::type;
  type* p;

public:
  access(type* p):p(p){}
  
  type&       get(Member){return *p;}
  const type& get(Member)const{return *p;}
};

template<typename Member0,typename... Members>
class access<Member0,Members...>:
  public access<Member0>,access<Members...>
{
public:
  template<typename Arg0,typename... Args>
  access(Arg0&& arg0,Args&&... args):
    access<Member0>(std::forward<Arg0>(arg0)),
    access<Members...>(std::forward<Args>(args)...)
  {}
  
  using access<Member0>::get;
  using access<Members...>::get;
};
To access, say, the data labeled as member<int,0> we need to write get(member<int,0>()). The price we have to pay for having data scattered around memory is that the access entity holds several pointers, one for member: on the other hand, the resulting objects, as we will see, really behave as on-the-fly proxies to their associated information, so access entities will seldom be stored. particle can be rewritten so that data is accessed through a generic access object:
template<typename Access>
class particle:Access
{
  using Access::get;
  
  using color=member<char,0>;
  using x=member<int,0>;
  using y=member<int,1>;
  using dx=member<int,2>;
  using dy=member<int,3>;

public:

  static const int max_x=200;
  static const int max_y=100;

  particle(const Access& a):Access(a){}

  void render()const
  {
    std::cout<<"["<<get(x())<<","
      <<get(y())<<","<<int(get(color()))<<"]\n";
  }

  void move()
  {
    get(x())+=get(dx());
    if(get(x())<0){
      get(x())*=-1;
      get(dx())*=-1;
    }
    else if(get(x())>max_x){
      get(x())=2*max_x-get(x());
      get(dx())*=-1;      
    }
    
    get(y())+=get(dy());
    if(get(y())<0){
      get(y())*=-1;
      get(dy())*=-1;
    }
    else if(get(y())>max_y){
      get(y())=2*max_y-get(y());
      get(dy())*=-1;      
    }
  }
};

template<typename Access>
particle<Access> make_particle(Access&& a)
{
  return particle<Access>(std::forward<Access>(a));
}
The transformations that need be done on the source code are not many:
  • Turn the class into a class template dependent on an access entity from which it derives.
  • Rather than declaring internal data members, define the corresponding member labels.
  • Delete former OOP constructors define just one constructor taking an access object as its only data member.
  • Replace mentions of data member by their corresponding access member function invocation (in the example, substitute get(color()) for color, get(x()) for x, etc.)
  • For convenience's sake, provide a make template function (in the example make_particle) to simplify object creation.
Observe how this woks in practice:
using color=member<char,0>;
using x=member<int,0>;
using y=member<int,1>;
using dx=member<int,2>;
using dy=member<int,3>;

char color_=5;
int  x_=20,y_=40,dx_=2,dy_=-1;

auto p=make_particle(access<color,x,y>(&color_,&x_,&y_));
auto q=make_particle(access<x,y,dx,dy>(&x_,&y_,&dx_,&dy_));
p.render();
q.move();
p.render();
The particle data now lives externally as a bunch of separate variables (or, in a more real-life scenario, stored in containers). p and q act as proxies to the same information (i.e., they don't copy data internally) but other than this they provide the same interface as the OOP version of particle, and can be used similarly. Note that the two objects specify different sets of access members, as required by render and move, respectively. So, the following
q.render(); // error
would result in a compile time error as render accesses data that q does not provide. Of course we can do
auto p=make_particle(
         access<color,x,y,dx,dy>(&color_,&x_,&y_,&dx_,&dy_)),
     q=p;
so that the resulting objects can take advantage of the entire particle interface. In later entries we will see how this need not affect performance in traversal algorithms. A nice side effect of this technique is that, when a DOD class is added extra data, former code will continue to work as long as this data is only used in new member functions of the class.
Implementing DOD enablement as a template policy also allows us to experiment with alternative access semantics. For instance, the tuple_storage utility
template<typename Tuple,std::size_t Index,typename... Members>
class tuple_storage_base;

template<typename Tuple,std::size_t Index>
class tuple_storage_base<Tuple,Index>:public Tuple
{
  struct inaccessible{};
public:
  using Tuple::Tuple;
  
  void get(inaccessible);
  
  Tuple&       tuple(){return *this;}
  const Tuple& tuple()const{return *this;}
};

template<
  typename Tuple,std::size_t Index,
  typename Member0,typename... Members
>
class tuple_storage_base<Tuple,Index,Member0,Members...>:
  public tuple_storage_base<Tuple,Index+1,Members...>
{
  using super=tuple_storage_base<Tuple,Index+1,Members...>;
  using type=typename Member0::type;

public:
  using super::super;
  using super::get;
  
  type&       get(Member0)
                {return std::get<Index>(this->tuple());}
  const type& get(Member0)const
                {return std::get<Index>(this->tuple());}  
};

template<typename... Members>
class tuple_storage:
  public tuple_storage_base<
    std::tuple<typename Members::type...>,0,Members...
  >
{
  using super=tuple_storage_base<
    std::tuple<typename Members::type...>,0,Members...
  >;
  
public:
  using super::super;
};
can we used to replace the external access policy with an object containing the data proper:
using storage=tuple_storage<color,x,y,dx,dy>;
auto r=make_particle(storage(3,100,10,10,-15));
auto s=r;
r.render();
r.move();
r.render();
s.render(); // different data than r
which effectively brings us back the old OOP class with ownership semantics. (Also, it is easy to implement an access policy on top of tuple_storage that gives proxy semantics for tuple-based storage. This is left as an exercise for the reader.)
A C++11 example program is provided that puts to use the ideas we have presented.
Traversal is at the core of DOD, as the paradigm is oriented towards handling large numbers of like objects. In a later entry we will extend this framework to provide for easy object traversal and measure the resulting performance as compared with OOP.