Monday, June 15, 2009

60/40 split

Tim Harford recounts in his book The Logic of Life an anecdote about the authors of Freakonomics, Steven D. Levitt and Stephen J. Dubner: when discussing the terms of their collaboration for writing the book, Levitt stated that he wouldn't go for less than a 60/40 split on revenues; Dubner, on the other hand, would settle for no less than 60/40 either. The apparent confrontation vanished when they realized that each was meaning the other to get 60%, so they decided to join the venture.

Charming as it may sound, this story cannot be true: if I'm settling for 40% of the revenues and my partner proposes a 60/40 split, the proposal immediately fits my goals no matter whom I assume will get each part of the split.

Monday, March 23, 2009

About break statements

In a recent article, Andrew Koenig argues that break statements are harmful because they make it harder to deduce the program context at loop termination. For instance, given:

while (n != 0) { /* do something */ }
n can be assumed to be zero at loop termination unless there is a break within it. Koenig proposes two different approaches to alleviating this problem:

  • Include the program context at the break statement as a possible value for the program context at loop termination.
  • Force the loop termination condition just before breaking.

There is another alternative: exclude the loop termination condition from the evaluable program context. Here is a way to do that:

for (int local_n = n; local_n != 0;) { /* do something with local_n */ }
As the scope of local_n does not extend outside the loop, program context after loop termination does not depend on the termination procedure. A cruder, but also more flexible way to do the same is the following:
{
‎ int local_n = n;
‎ while (local_n != 0) { /* do something with local_n */ }
‎}
The somewhat unexpected surrounding braces are a conspicuous reminder that local_n does not form part of the global program state.

Tuesday, March 17, 2009

A curious syntactic transformation

Consider the sentence

Rationalism has many followers.

What in principle looks like a rather dull N+V+O statement does not however allow for seeminlgy innocuous variations on the object:

*Rationalism has many people.

The reason why the former is valid while the latter is not is that followers are followers of something, in this particular case rationalism:

Rationalism has many followers [of rationalism].

In fact, we can view our sentence as a mere rewording of:

There are many followers of rationalism,

which leads us to hypothetize the existence of the following syntactic transformation:

There is/are [Det] N of NP → NP has/have [Det] N.

For instance:

There were lots of fans of the BeatlesThe Beatles had lots of fans.
There are no enemies of RomeRome has no enemies.

This transformational rule explains from a purely syntactical perspective why superficially similar sentences like *Rationalism has many people are invalid --they have no "There is..." equivalent.

Interestingly, the rule seems to operate in other languages apart from English (I presume that at least in most modern European languages):

El racionalismo tiene muchos seguidores.
Le rationalisme a beaucoup d'adeptes.
Rationalismus hat viele Anhänger.

This points to some general (maybe universal?) mechanism of reification of the possesion relationship between nouns.

Tuesday, March 10, 2009

The mythical Bell Curve in Human Resources

Suppose a company's HR department wishes to establish a statistical model for their personnel performance in order to set up outcome predictions for the company's bonus system. It is all too easy to assume that performance will follow a normal distribution such as this:

Fig. 1: Performance distribution under the Bell Curve assumption.

The assumption stems from the deeply held custom in Psychology and Sociology of using the Bell Curve to model any a priori unknown human trait. Actually, if the company's hiring process is indeed efficient in selecting better than average people, this assumption is a complete contradiction.

Consider a simplistic hiring process in which performance is assesed by means of a test, so that applicants scoring some fixed minimum at the test are hired, and let us concede for the sake of the argument that the test is a perfect predictor of performance. The resulting performance distribution is depicted at the figure.

Fig. 2: Performance distribution with a test-based selection process.

The distribution has positive skewness, i.e. its right tail is longer than the left tail. So, the normal approximation with the same mean and variance (shown in dotted line) both overestimates low performers and underestimates high performers. It also underestimates low-to-normal performers and overestimates normal-to-high performers.

In other hiring process scenario, the best among N candidates is selected. The resulting distribution is depicted at the following figure for N = 10.

Fig. 3: Performance distribution with best-of-10 selection process.

We have positive skewness again, though not as marked as in the prvious case. Skewness grows as N does. Again, the normal approximation results in overestimation of low performers and underestimation of high performers.

Finally, we consider a two-stage hiring process where applicants are first filtered by a test and then the best candidate out of N is selected.

Fig. 4: Performance distribution with test prefiltering and best-of-10 selection process.

The test filter results in a slightly larger positive skewness. As in previous cases, normal approximation predicts more low performers and less high performers than the real case.

To summarize: hiring process not only results in a personnel performance distribution with a higher than average mean (which is the primary purpose of any hiring process); the distribution will also have positive skewness, with more excellent and less deficient people than predicted by the Bell Curve.