## Monday, March 31, 2008

### Theorem on zero tolerance and behavior control

We formally state and prove the fact about zero-tolerance and progressive penalties described in a prior entry.

Definitions. A utility function is any real function. A penalty is a real non-decreasing function p(x) such that p(x) = 0 if xxlegal, for some fixed xlegal value. The penalty is said to be zero-tolerance if it has the form p0 + r(x) for x < xlegal, with p0 > 0 and r(x) non-decreasing. On the other hand, p(x) is progressive if it is continuous.

Theorem. Let u(x) be a continuous unimodal utility function with maximum at xmax > xlegal and p(x) = p0 + r(x) a zero-tolerance penalty such that u(xlegal) − εw(x) = u(x) − p(x) for x > xlegal and some ε > 0. There is then a progressive penalty p'(x) such that u(xlegal) > u(x) − p'(x) and p'(x) ≤ p(x) for x > xlegal.

Proof. Choose some continuous strictly increasing function d(x) on [xlegal, xmax] with d(xlegal) = 0, d(x) ≤ ε in all its domain (for instance, d(x) = ε(xxlegal)/(xmaxxlegal) ) and define p'(x) on [xlegal, xmax] as

p'(x) = u(x) − u(xlegal) + d(x),

which is clearly non-decreasing given the unimodality of u(x). Then, for any x in (xlegal, xmax] we have:

u(x) − p'(x) = u(xlegal) − d(x) < u(xlegal),

and also:

p'(x) = u(x) − u(xlegal) + d(x) ≤ u(x) − u(xlegal) + εp(x),

where the last inequality follows directly from the theorem hypothesis on p(x). To complete our definition of p'(x), extend it for x > xmax as p'(x) = p'(xmax). p'(x) is then clearly a progressive penalty and verifies all the conditions of the theorem.

It is not possible to relax the theorem hypothesis to merely requiring that u(xlegal) > w(x) = u(x) − p(x) for x > xlegal; the figure shows a counterexample:

The zero-tolerance penalty function p(x) tends to u(xmax) − u(xlegal) when x tends to xmax from the left, even though p(xmax) > u(xmax) − u(xlegal). So, for any continuous p'(x) ≤ p(x) we will have p'(xmax) ≤ u(xmax) − u(xlegal), making p'(x) an inefficient penalty for u(x). Anyway, this limit case has no real-life significance.

## Thursday, March 27, 2008

### Epistemics of the two-army problem

The so-called two-army problem is well known among students of computer networks: Two allied armies A and B are physically separated and must decide on a common time to make a joint attack, with the additional constraint that they can only communicate through an unreliable channel. Let's say A sends a message to B proposing to attack at 9:00 AM. If B receives the message it will send an acknowledegment back to A, but as the communication channel is faulty B cannot know for sure whether the acknowledgement is received, thus making the decision to attack at 9:00 a risky one in case A does not join. Both armies can send acknowledgements back and forth to increase their confidence that they have agreed to make the joint attack at 9:00 AM, but it can be easily proved that no amount of message exchange through the unreliable channel can guarantee 100% confidence --the key argument is that if there were a last message from one army to the other settling the agreement for good, the uncertainty on whether the message got through would render this allegedly definitive message inconclusive.

This puzzle is used to illustrate the limitations of unreliable communication channels, but as I see it the more fundamental limitation is not related to communication unreliability, but rather to the impossibility of acquiring common knowledge: Let us introduce the modal operators KA and KB stating that A and B respectively know some given fact. So, if φ is the statement "A proposes to attack at 9:00 AM", the successive, succesful exchanges between A and B will render the following statements true:

KBφ,
KAKBφ,
KBKAKBφ,
KAKBKAKBφ...

But in order for both armies to have complete confidence that they will make the joint attack at the stipulated time, we would need to establish an infinite amount of statements of the form:

(KAKB)nφ and KB(KAKB)nφ, n = 0,1,2,...

The properties of epistemic logic imply that KAφKAKAφKAKAKAφ → ..., which follows from the common-sensical fact that if somebody knows something, she knows that she knows it. But we don't have this sort of introspection capacity when two knowing agents are involved.

Suppose now that there were some special mode of acquiring knowledge of a certain proposition that we will call colearning, such that when φ is colearnt by A it follows that KAφ is colearnt by B, and viceversa. By recursion, B colearning φ implies the whole set of sentences (KAKB)nφ and KB(KAKB)nφ described above.

Colearning is impossible in the presence of an unreliable communication channel. But we will see in a later entry that we can implement almost sure colearning, i.e. colearning with probability one.

## Monday, March 24, 2008

### Convexity in Zn: hull algorithm

We devise an efficient algorithm for the computation of the wavefront hull of a bounded set in Zn.

Definition. Let x be a point of Zn, r ≥ 0. We define the set Vrx = {x+i, xi: i = 1,...,n}, where δi is the element of Zn with i-th coordinate equal to 1, 0 elsewhere. Vrx is the set of extreme points of Brx.

Lemma. If m is a nonnegative integer, B2mx = BmVmx (proof trivial).

Corollary. x is in Hwave(X) iff VmxBmX for some nonnegative integer m.

Theorem. Let X be a subset of Zn completely contained in some half-space H = {x in Zn : xk b} and p=(p1,...,b,...,pn) a point on the boundary of H. The points pk, 0≤i<dist(p,X), do not belong to Hwave(X).

Proof. For any nonnegative integer m > dist(p,X), p' = pk+k belongs to Vm(pk) and does not belong to H. The distance from p' to an arbitrary point x in X is then dist(p',x) = dist(p,x) + midist(p,X) + mi > m. So, Vm(pk) is not in BmX and hence the point does not belong to Hwave(X).

Observation. The same result applies for half-spaces of the form H = {x in Zn : xk b}, changing signs as appropriate.

Theorem. Let X be a bounded subset of Zn and x a point outside Hwave(X). There is then a coordinate k and a sign σ in {+1,−1} such that for any half-space H = {y in Zn : σyk b} enclosing XU{x}, dist(x,p) < dist(p,X) where p=(x1,...,b,...,xn).

Proof. As x does not belong to Hwave(X), Vmx is not entirely included in BmX for any m. Let choose an m greater than max{dist(x,x') : x' in X}: there is a then a point x' = x+k (or xk) outside BmX; withous loss of generality, let us assume the first variant, which corresponds to σ = +1. By our choice of m, x' lies outside some half-space {y in Zn : yk b'} which encloses XU{x}. Let H = {y in Zn : yk b} be another arbitrary half-space enclosing XU{x} and p = (x1,...,b,...,xn). Using the fact that the k-th coordinates of x' and p are both no less than the k-th coordinate of any point of XU{x}, we have that dist(x,p) = dist(x,x') + p xk m and dist(p,X) = dist(x',X) + p xk m. Since dist(x,x') = m < dist(x',X), it follows that dist(x,p) < dist(p,X).

These theorems allow us to design an efficient algorithm for computing the wavefront hull of a bounded set X:

wavefront hull(X)
{determine the minimum X-enclosing prism P}
P ← [a1,b1]x···x[an,bn]
for i = 1 to n
····
for every p = (p1,...,pn) with pi = 0 and ajpjbj, ji
········for k = 0 to dist(p+aiδi,X) − 1
············mark(p+(ai+k)δi)
········end for
········for k = 0 to dist(p+biδi,X) − 1
············mark(p+(bik)δi)
········end for
····end for
end for
{return the points of P that have not been marked}

I have written an implementation of the algorithm for Z2; you will need Boost to compile the code. The program accepts from the console input a description of the set X for which to calculate the hull as a number of rows where a character `*` denotes an element of X and any other character stands for a point outside X, like for instance:

`*...*..**....*..***....·.....**·`

## Sunday, March 16, 2008

### On certain types of binary relations

Given a binary relation R, the ancestral of R, denoted by R*, is the binary relation which holds between objects a and b iff there is a chain a = x1, x2,..., xn = b such that Rxixi+1 for i = 1,..., n − 1. The following is Frege's celebrated definition of the ancestral relation in second-order logic:

R*ab ↔ for every R-hereditary P (for all x (RaxPx) → Pb),
P is R-hereditary ↔ for all x, y (Px and RxyPy).

A unary predicate P is R-hereditary if it is "transmitted" from a given object to its children. a is then an ancestor of b iff every R-hereditary property of the children of a is also a property of b. An interesting aspect of this clever definition is its apparent circularity: the property of being a descendent of a, which is represented by R*a, the unary predicate obtained from R* by fixing the first argument to a, is R-hereditary and true of a's children. The following exposes even more clearly the relationship between R* and its definition clause:

R*ab ↔ for all R-compatible Q (Qab)
Q is R-compatible ↔ for all x, y, z ((RxyQxy) and (Qxy and RyzQxz)).

Again, R* is R-compatible, and any R-compatible relation must hold for at least the pairs xy for which R* holds; this is a particular example of the following definition schema for some binary relation S:

Sab ↔ for all Q (ΦS(Q) → Qab),

where ΦS is a formula with a binary relation free variable. We might call ΦS a core formula for S; in a sense, S is the "miminum" relation satisfying ΦS. For instance, a core formula for the equality relation is

Φ=(Q) := for all x (Qxx),

that is,

a = b ↔ for all Q (for all x (Qxx) → Qab),

which can be interpreted as the statement that = is the minimum reflexive relation; this definition looks more conservative than the customary Leibniz's Law used to define equality in second-order logic.

Any binary relation definable in first-order logic, i.e. for which there is a first-order definition formula φ(x,y) such that

Sabφ(a,b),

has a core formula: take ΦS(Q) := for all x,y (Qxyφ(x,y)). The converse, however, is not true: for instance, neither R* nor = are definable via first-order formulas.

## Tuesday, March 11, 2008

### Zero tolerance and behavior control

Suppose that a given aspect of societal behavior can be quantified according to a numerical value x, so that values of x greater than some threshold xlegal are illegal, and more harmful to society as x grows: examples of this are speeding and alcohol driving limits, tax evasion, etc.

What determines the choice of behavior, i.e. the preferred x, for a given individual? If we denote by u(x) the utility of x for the individual, this will choose the x at which u(x) is maximum. In the absence of any law regulating this behavior, u(x) is composed of two kind of components:

1. Positive factors that make the individual utility grow as x increases: Pleasure from speeding, money evaded, etc.
2. Negative factors detracting from the utility as x increases: Fear of an accident, moral concerns, etc. By definition, law penalties are not included here.

To simplify things, let us assume that positive and negative factors contribute in such a way that u(x) increases up to a single maximum at xmax and then decreases from that point on --in other words, u(x) is a unimodal function. When there is no law enforcement, if xmaxxlegal the individual is a law-abiding citizen, otherwise she is a law offender.

Fig. 1: unimodal utility function.

Consider now the introduction of a law penalty p(x) which is null for xxlegal and non-decreasing for xxlegal. Under this new scenario, the resulting individual utility is

w(x) = u(x) − p(x).

For law-abiding individuals, the penalty has no effect and thus the resulting behavior is unaffected. For law offenders, the utility of their original position decreases; if p(x) is high enough for all x > xlegal, the maximum of w(x) will be reached then at xlegal and the former offender will opt for law abidance.

There are several approaches to designing an effective penalty function. A zero-tolerance policy is one for which even small offenses on the vicinity of legal behavior are punished severely and without exception. Within our model, a zero-tolerance penalty has the form:

p(x) = p0 + r(x), x > xlegal,

where p0 > 0 and r(x) is a positive, non-decreasing function (for x > xlegal) which we call the residual penalty. p0 is the minimum penalty an offender will get regardless of the degree of the offence. The greater p0, the tougher the policy is. On the other hand, we say that p(x) is a progressive penalty if it is nondecreasing and continuous (zero-tolerance penalties are obviously not continuous at xlegal). Please note that the characterizations given here for zero-tolerance and progressive penalties are my own and should not be taken as universally known or accepted. The following two figures show the effect of a zero-tolerance and a progressive penalty on the utility function of a law offender.

Fig. 2: impact on utility of a zero-tolerance penalty.

Fig. 3: impact on utility of a progressive penalty.

In both cases the penalty makes the individual opt for a legal position, but the zero-tolerance policy does it at the cost of stricter than necessary punishment for small offences, as stated by the following

Fact. For any zero-tolerance penalty p(x) succesfully enforcing law on a utility function u(x) there exists an equally effective progressive penalty p'(x) which is less severe than p(x), i.e. p'(x) ≤ p(x) for all x.

(We will formally state and prove this rather simple result in a later entry. Informally speaking, we can transformate p(x) into a progressive p'(x) in such a way that the gap u(xlegal) − w(x) tends to 0 as x approaches xlegal --for a zero-tolerance penalty this gaps tends to some p0 > 0.)

The fact above is the main reason why I find it hard to justify the alleged benefits of zero-tolerance approaches to illegal behavior control as compared with traditional progressive penalties.

## Friday, March 7, 2008

### Convexity in Zn

We study the generalized concept of convexity defined at a prior entry as applied to the set of n-tuples of integers, Zn, with the L1 distance:

dist((x1,...,xn),(y1,...,yn)) := ∑|yixi|.

The picture displays several sets in Z2 (black cells) along with their wavefront hulls (black and gray cells). The resulting sets do not look at all like traditional convex sets (they do not even have to be connected), however they retain the operational properties of convexity (closed under intersections, etc.)

We investigate now how to calculate wavefront hulls in Zn.

Theorem. Zn is regular.

Proof. Let X, Y in Zn such that BrXBrY, sr. Since Btx = Bfloor(t)x for any radius t, we can assume without loss of generality that r and s are nonnegative integers. If x belongs to BsX \ BrX there exists some x' in X such that r < dist(x',x) = ∑|xix'i| = ∑σiδis, where δi = xix'i, σi = sign(δi). It is easily seen that we can choose nonnegative integers η1,...,ηn such ηiσiδi and ∑ηi = r. If we define x'' with coordinates x''i = x'i + σiηi, we have dist(x',x'') = r, dist(x'',x) = dist(x',x) − rsr, hence x belongs to BsrBrX. Since by hypothesis BrXBrY, we conclude that x also belongs to BsrBrYBsY.

Lemma. If a metric space S is regular, rsCBrCBrXCBsCBsX for all X in S.

Proof. If x belongs to CBrCBrX then BrxBrX, and by the regularity of S BsxBsX, i.e. x belongs to CBsCBsX.

Theorem. If X in Zn is bounded, Hwave(X) = CBrCBrX for some r ≥ 0.

Proof. Since prisms (cartesian products of n integer intervals) can be easily proved to be convex, Hwave(X) is contained in a finite prism enclosing X, and hence Hwave(X) is finite itself, so we can drop all but finitely many terms in the expression Ur≥0CBrCBrX = Hwave(X). By the prior lemma, the remaning elements are contained in that with the largest r.

This theorem allows us to devise a very primitive algorithm to calculate Hwave(X): traverse all the points of the minimun enclosing prism and compute for each point x whether BrxBrX with a large enough value of r (although this would need proof, r = diameter(X) seems to suffice). We will develop more sophisticated techniques at a later entry.

## Monday, March 3, 2008

### Swo extensions and C++: examples

As we have seen in a prior entry, swo extensions are an alternative formulation of the partition-based semantics used to describe the acceptable arguments to C++ binary search algorithms. This means that if a sequence `[start, finish)` of elements of type `A` is ordered with respect to some swo `less_A` and we have an swo extension `less_AB` with interface:

`struct less_AB{ bool operator()(const A&,const B&); bool operator()(const B&,const A&);};`
then we can use C++ binary search algorithms with arguments of type `B`, using `less_AB` as the comparison predicate.

The following are examples of swo extensions with some practical interest:

If `A` = `std::tuple<A1,...,An>` and `less_A` is a lexicographical swo on `A`, we can define a swo extension `less_AB` on `B` = `std::tuple<A1,...,Am>`, `m``n`, as follows: comparing `a` = `(a1,...,an)` and `b` = `(b1,...bm)` is equivalent to comparing `b'` = `(a1,...,am)` and `b` with the lexicographical order on `B` obtained by using the first `m` componenents of `less_A`.

Let `A1`,...,`An` be types forming a single-inheritance hierarchy tree T rooted at `A1`, <T a swo between types induced by a preorder traversal of T and a predicate `less_A` on polymorphic objects of base type `A1` such that `less_A()(x,y)` iff type(`x`) <T type(`y`). Given a fixed `Ai` we define the predicate `less_Ai` by `less_Ai()(x,y)` = `less_A()(x,y)` if either `x` or `y` is not of type `Ai` or derived, `false` otherwise. Technically speaking, `less_Ai` is not a swo extension of `less_A` because ranges coincide, but it can be shown to be equivalent in all respects to a true swo extension (replace the left and right arguments by some set isomorphic to the original but different to it). In practical terms, the existence of this swo extension shows that we can binary search for all objects derived from some fixed type `Ai` if the objects are sorted according to a preorder traversal of their type hierarchy. This is obvious once we observe that preorder traversals "pack" elements belonging to the same hierarchy subtree, i.e. they render them adjacently. This property of preorder traversals with respect to single inheritance hierarchies is used in some languages for implementing efficient dispatch algorithms.