Bannalia: trivial notes on themes diverse: March 2008

Monday, March 31, 2008

Theorem on zero tolerance and behavior control

We formally state and prove the fact about zero-tolerance and progressive penalties described in a prior entry.

Definitions. A utility function is any real function. A penalty is a real non-decreasing function p(x) such that p(x) = 0 if x ≤ x_legal, for some fixed x_legal value. The penalty is said to be zero-tolerance if it has the form p₀ + r(x) for x < x_legal, with p₀ > 0 and r(x) non-decreasing. On the other hand, p(x) is progressive if it is continuous.

Theorem. Let u(x) be a continuous unimodal utility function with maximum at x_max > x_legal and p(x) = p₀ + r(x) a zero-tolerance penalty such that u(x_legal) − ε ≥ w(x) = u(x) − p(x) for x > x_legal and some ε > 0. There is then a progressive penalty p'(x) such that u(x_legal) > u(x) − p'(x) and p'(x) ≤ p(x) for x > x_legal.

Proof. Choose some continuous strictly increasing function d(x) on [x_legal, x_max] with d(x_legal) = 0, d(x) ≤ ε in all its domain (for instance, d(x) = ε(x − x_legal)/(x_max − x_legal) ) and define p'(x) on [x_legal, x_max] as

p'(x) = u(x) − u(x_legal) + d(x),

which is clearly non-decreasing given the unimodality of u(x). Then, for any x in (x_legal, x_max] we have:

u(x) − p'(x) = u(x_legal) − d(x) < u(x_legal),

and also:

p'(x) = u(x) − u(x_legal) + d(x) ≤ u(x) − u(x_legal) + ε ≤ p(x),

where the last inequality follows directly from the theorem hypothesis on p(x). To complete our definition of p'(x), extend it for x > x_max as p'(x) = p'(x_max). p'(x) is then clearly a progressive penalty and verifies all the conditions of the theorem.

It is not possible to relax the theorem hypothesis to merely requiring that u(x_legal) > w(x) = u(x) − p(x) for x > x_legal; the figure shows a counterexample:

The zero-tolerance penalty function p(x) tends to u(x_max) − u(x_legal) when x tends to x_max from the left, even though p(x_max) > u(x_max) − u(x_legal). So, for any continuous p'(x) ≤ p(x) we will have p'(x_max) ≤ u(x_max) − u(x_legal), making p'(x) an inefficient penalty for u(x). Anyway, this limit case has no real-life significance.

Thursday, March 27, 2008

Epistemics of the two-army problem

The so-called two-army problem is well known among students of computer networks: Two allied armies A and B are physically separated and must decide on a common time to make a joint attack, with the additional constraint that they can only communicate through an unreliable channel. Let's say A sends a message to B proposing to attack at 9:00 AM. If B receives the message it will send an acknowledegment back to A, but as the communication channel is faulty B cannot know for sure whether the acknowledgement is received, thus making the decision to attack at 9:00 a risky one in case A does not join. Both armies can send acknowledgements back and forth to increase their confidence that they have agreed to make the joint attack at 9:00 AM, but it can be easily proved that no amount of message exchange through the unreliable channel can guarantee 100% confidence --the key argument is that if there were a last message from one army to the other settling the agreement for good, the uncertainty on whether the message got through would render this allegedly definitive message inconclusive.

This puzzle is used to illustrate the limitations of unreliable communication channels, but as I see it the more fundamental limitation is not related to communication unreliability, but rather to the impossibility of acquiring common knowledge: Let us introduce the modal operators K_A and K_B stating that A and B respectively know some given fact. So, if φ is the statement "A proposes to attack at 9:00 AM", the successive, succesful exchanges between A and B will render the following statements true:

K_Bφ,
K_AK_Bφ,
K_BK_AK_Bφ,
K_AK_BK_AK_Bφ...

But in order for both armies to have complete confidence that they will make the joint attack at the stipulated time, we would need to establish an infinite amount of statements of the form:

(K_AK_B)ⁿφ and K_B(K_AK_B)ⁿφ, n = 0,1,2,...

The properties of epistemic logic imply that K_Aφ → K_AK_Aφ → K_AK_AK_Aφ → ..., which follows from the common-sensical fact that if somebody knows something, she knows that she knows it. But we don't have this sort of introspection capacity when two knowing agents are involved.

Suppose now that there were some special mode of acquiring knowledge of a certain proposition that we will call colearning, such that when φ is colearnt by A it follows that K_Aφ is colearnt by B, and viceversa. By recursion, B colearning φ implies the whole set of sentences (K_AK_B)ⁿφ and K_B(K_AK_B)ⁿφ described above.

Colearning is impossible in the presence of an unreliable communication channel. But we will see in a later entry that we can implement almost sure colearning, i.e. colearning with probability one.

Monday, March 24, 2008

Convexity in Zⁿ: hull algorithm

We devise an efficient algorithm for the computation of the wavefront hull of a bounded set in Zⁿ.

Definition. Let x be a point of Zⁿ, r ≥ 0. We define the set V_rx = {x+rδ_i, x−rδ_i: i = 1,...,n}, where δ_i is the element of Zⁿ with i-th coordinate equal to 1, 0 elsewhere. V_rx is the set of extreme points of B_rx.

Lemma. If m is a nonnegative integer, B_2mx = B_mV_mx (proof trivial).

Corollary. x is in H_wave(X) iff V_mx ≤ B_mX for some nonnegative integer m.

Theorem. Let X be a subset of Zⁿ completely contained in some half-space H = {x in Zⁿ : x_k≤ b} and p=(p₁,...,b,...,p_n) a point on the boundary of H. The points p−iδ_k, 0≤i<dist(p,X), do not belong to H_wave(X).

Proof. For any nonnegative integer m > dist(p,X), p' = p−iδ_k+mδ_k belongs to V_m(p−iδ_k) and does not belong to H. The distance from p' to an arbitrary point x in X is then dist(p',x) = dist(p,x) + m − i ≥ dist(p,X) + m − i > m. So, V_m(p − iδ_k) is not in B_mX and hence the point does not belong to H_wave(X).

Observation. The same result applies for half-spaces of the form H = {x in Zⁿ : x_k≥ b}, changing signs as appropriate.

Theorem. Let X be a bounded subset of Zⁿ and x a point outside H_wave(X). There is then a coordinate k and a sign σ in {+1,−1} such that for any half-space H = {y in Zⁿ : σy_k≤ b} enclosing XU{x}, dist(x,p) < dist(p,X) where p=(x₁,...,b,...,x_n).

Proof. As x does not belong to H_wave(X), V_mx is not entirely included in B_mX for any m. Let choose an m greater than max{dist(x,x') : x' in X}: there is a then a point x' = x+mδ_k (or x−mδ_k) outside B_mX; withous loss of generality, let us assume the first variant, which corresponds to σ = +1. By our choice of m, x' lies outside some half-space {y in Zⁿ : y_k≤ b'} which encloses XU{x}. Let H = {y in Zⁿ : y_k≤ b} be another arbitrary half-space enclosing XU{x} and p = (x₁,...,b,...,x_n). Using the fact that the k-th coordinates of x' and p are both no less than the k-th coordinate of any point of XU{x}, we have that dist(x,p) = dist(x,x') + p − x_k− m and dist(p,X) = dist(x',X) + p − x_k− m. Since dist(x,x') = m < dist(x',X), it follows that dist(x,p) < dist(p,X).

These theorems allow us to design an efficient algorithm for computing the wavefront hull of a bounded set X:

wavefront hull(X)
{determine the minimum X-enclosing prism P}
P ← [a₁,b₁]x···x[a_n,b_n]
for i = 1 to n
····for every p = (p₁,...,p_n) with p_i = 0 and a_j ≤ p_j ≤ b_j, j ≠ i
········for k = 0 to dist(p+a_iδ_i,X) − 1
············mark(p+(a_i+k)δ_i)
········end for
········for k = 0 to dist(p+b_iδ_i,X) − 1
············mark(p+(b_i−k)δ_i)
········end for
····end for
end for
{return the points of P that have not been marked}

I have written an implementation of the algorithm for Z²; you will need Boost to compile the code. The program accepts from the console input a description of the set X for which to calculate the hull as a number of rows where a character * denotes an element of X and any other character stands for a point outside X, like for instance:

*...*..*
*....*..
***....·
.....**·

Sunday, March 16, 2008

On certain types of binary relations

Given a binary relation R, the ancestral of R, denoted by R^*, is the binary relation which holds between objects a and b iff there is a chain a = x₁, x₂,..., x_n = b such that Rx_ix_i+1 for i = 1,..., n − 1. The following is Frege's celebrated definition of the ancestral relation in second-order logic:

R^*ab ↔ for every R-hereditary P (for all x (Rax → Px) → Pb),
P is R-hereditary ↔ for all x, y (Px and Rxy → Py).

A unary predicate P is R-hereditary if it is "transmitted" from a given object to its children. a is then an ancestor of b iff every R-hereditary property of the children of a is also a property of b. An interesting aspect of this clever definition is its apparent circularity: the property of being a descendent of a, which is represented by R^*_a, the unary predicate obtained from R^* by fixing the first argument to a, is R-hereditary and true of a's children. The following exposes even more clearly the relationship between R^* and its definition clause:

R^*ab ↔ for all R-compatible Q (Qab)
Q is R-compatible ↔ for all x, y, z ((Rxy → Qxy) and (Qxy and Ryz → Qxz)).

Again, R^* is R-compatible, and any R-compatible relation must hold for at least the pairs xy for which R^* holds; this is a particular example of the following definition schema for some binary relation S:

Sab ↔ for all Q (Φ_S(Q) → Qab),

where Φ_S is a formula with a binary relation free variable. We might call Φ_S a core formula for S; in a sense, S is the "miminum" relation satisfying Φ_S. For instance, a core formula for the equality relation is

Φ₌(Q) := for all x (Qxx),

that is,

a = b ↔ for all Q (for all x (Qxx) → Qab),

which can be interpreted as the statement that = is the minimum reflexive relation; this definition looks more conservative than the customary Leibniz's Law used to define equality in second-order logic.

Any binary relation definable in first-order logic, i.e. for which there is a first-order definition formula φ(x,y) such that

Sab ↔ φ(a,b),

has a core formula: take Φ_S(Q) := for all x,y (Qxy ↔ φ(x,y)). The converse, however, is not true: for instance, neither R^* nor = are definable via first-order formulas.

Tuesday, March 11, 2008

Zero tolerance and behavior control

Suppose that a given aspect of societal behavior can be quantified according to a numerical value x, so that values of x greater than some threshold x_legal are illegal, and more harmful to society as x grows: examples of this are speeding and alcohol driving limits, tax evasion, etc.

What determines the choice of behavior, i.e. the preferred x, for a given individual? If we denote by u(x) the utility of x for the individual, this will choose the x at which u(x) is maximum. In the absence of any law regulating this behavior, u(x) is composed of two kind of components:

Positive factors that make the individual utility grow as x increases: Pleasure from speeding, money evaded, etc.
Negative factors detracting from the utility as x increases: Fear of an accident, moral concerns, etc. By definition, law penalties are not included here.

To simplify things, let us assume that positive and negative factors contribute in such a way that u(x) increases up to a single maximum at x_max and then decreases from that point on --in other words, u(x) is a unimodal function. When there is no law enforcement, if x_max ≤ x_legal the individual is a law-abiding citizen, otherwise she is a law offender.

Fig. 1: unimodal utility function.

Consider now the introduction of a law penalty p(x) which is null for x ≤ x_legal and non-decreasing for x ≥ x_legal. Under this new scenario, the resulting individual utility is

w(x) = u(x) − p(x).

For law-abiding individuals, the penalty has no effect and thus the resulting behavior is unaffected. For law offenders, the utility of their original position decreases; if p(x) is high enough for all x > x_legal, the maximum of w(x) will be reached then at x_legal and the former offender will opt for law abidance.

There are several approaches to designing an effective penalty function. A zero-tolerance policy is one for which even small offenses on the vicinity of legal behavior are punished severely and without exception. Within our model, a zero-tolerance penalty has the form:

p(x) = p₀ + r(x), x > x_legal,

where p₀ > 0 and r(x) is a positive, non-decreasing function (for x > x_legal) which we call the residual penalty. p₀ is the minimum penalty an offender will get regardless of the degree of the offence. The greater p₀, the tougher the policy is. On the other hand, we say that p(x) is a progressive penalty if it is nondecreasing and continuous (zero-tolerance penalties are obviously not continuous at x_legal). Please note that the characterizations given here for zero-tolerance and progressive penalties are my own and should not be taken as universally known or accepted. The following two figures show the effect of a zero-tolerance and a progressive penalty on the utility function of a law offender.

Fig. 2: impact on utility of a zero-tolerance penalty.

Fig. 3: impact on utility of a progressive penalty.

In both cases the penalty makes the individual opt for a legal position, but the zero-tolerance policy does it at the cost of stricter than necessary punishment for small offences, as stated by the following

Fact. For any zero-tolerance penalty p(x) succesfully enforcing law on a utility function u(x) there exists an equally effective progressive penalty p'(x) which is less severe than p(x), i.e. p'(x) ≤ p(x) for all x.

(We will formally state and prove this rather simple result in a later entry. Informally speaking, we can transformate p(x) into a progressive p'(x) in such a way that the gap u(x_legal) − w(x) tends to 0 as x approaches x_legal --for a zero-tolerance penalty this gaps tends to some p₀ > 0.)

The fact above is the main reason why I find it hard to justify the alleged benefits of zero-tolerance approaches to illegal behavior control as compared with traditional progressive penalties.

Friday, March 7, 2008

Convexity in Zⁿ

We study the generalized concept of convexity defined at a prior entry as applied to the set of n-tuples of integers, Zⁿ, with the L₁ distance:

dist((x₁,...,x_n),(y₁,...,y_n)) := ∑|y_i−x_i|.

The picture displays several sets in Z² (black cells) along with their wavefront hulls (black and gray cells). The resulting sets do not look at all like traditional convex sets (they do not even have to be connected), however they retain the operational properties of convexity (closed under intersections, etc.)

We investigate now how to calculate wavefront hulls in Zⁿ.

Theorem. Zⁿ is regular.

Proof. Let X, Y in Zⁿ such that B_rX ≤ B_rY, s ≥ r. Since B_tx = B_floor(t₎x for any radius t, we can assume without loss of generality that r and s are nonnegative integers. If x belongs to B_sX \ B_rX there exists some x' in X such that r < dist(x',x) = ∑|x_i−x'_i| = ∑σ_iδ_i ≤ s, where δ_i = x_i−x'_i, σ_i = sign(δ_i). It is easily seen that we can choose nonnegative integers η₁,...,η_n such η_i ≤ σ_iδ_i and ∑η_i = r. If we define x'' with coordinates x''_i = x'_i + σ_iη_i, we have dist(x',x'') = r, dist(x'',x) = dist(x',x) − r ≤ s − r, hence x belongs to B_s₋_rB_rX. Since by hypothesis B_rX ≤ B_rY, we conclude that x also belongs to B_s₋_rB_rY ≤ B_sY.

Lemma. If a metric space S is regular, r ≤ s → CB_rCB_rX ≤ CB_sCB_sX for all X in S.

Proof. If x belongs to CB_rCB_rX then B_rx ≤ B_rX, and by the regularity of S B_sx ≤ B_sX, i.e. x belongs to CB_sCB_sX.

Theorem. If X in Zⁿ is bounded, H_wave(X) = CB_rCB_rX for some r ≥ 0.

Proof. Since prisms (cartesian products of n integer intervals) can be easily proved to be convex, H_wave(X) is contained in a finite prism enclosing X, and hence H_wave(X) is finite itself, so we can drop all but finitely many terms in the expression U_r≥0CB_rCB_rX = H_wave(X). By the prior lemma, the remaning elements are contained in that with the largest r.

This theorem allows us to devise a very primitive algorithm to calculate H_wave(X): traverse all the points of the minimun enclosing prism and compute for each point x whether B_rx ≤ B_rX with a large enough value of r (although this would need proof, r = diameter(X) seems to suffice). We will develop more sophisticated techniques at a later entry.

Monday, March 3, 2008

Swo extensions and C++: examples

As we have seen in a prior entry, swo extensions are an alternative formulation of the partition-based semantics used to describe the acceptable arguments to C++ binary search algorithms. This means that if a sequence [start, finish) of elements of type A is ordered with respect to some swo less_A and we have an swo extension less_AB with interface:

struct less_AB
{
 bool operator()(const A&,const B&);
 bool operator()(const B&,const A&);
};

then we can use C++ binary search algorithms with arguments of type B, using less_AB as the comparison predicate.

The following are examples of swo extensions with some practical interest:

If A = std::tuple<A1,...,An> and less_A is a lexicographical swo on A, we can define a swo extension less_AB on B = std::tuple<A1,...,Am>, m≤n, as follows: comparing a = (a1,...,an) and b = (b1,...bm) is equivalent to comparing b' = (a1,...,am) and b with the lexicographical order on B obtained by using the first m componenents of less_A.

Let A1,...,An be types forming a single-inheritance hierarchy tree T rooted at A1, <_T a swo between types induced by a preorder traversal of T and a predicate less_A on polymorphic objects of base type A1 such that less_A()(x,y) iff type(x) <_T type(y). Given a fixed Ai we define the predicate less_Ai by less_Ai()(x,y) = less_A()(x,y) if either x or y is not of type Ai or derived, false otherwise. Technically speaking, less_Ai is not a swo extension of less_A because ranges coincide, but it can be shown to be equivalent in all respects to a true swo extension (replace the left and right arguments by some set isomorphic to the original but different to it). In practical terms, the existence of this swo extension shows that we can binary search for all objects derived from some fixed type Ai if the objects are sorted according to a preorder traversal of their type hierarchy. This is obvious once we observe that preorder traversals "pack" elements belonging to the same hierarchy subtree, i.e. they render them adjacently. This property of preorder traversals with respect to single inheritance hierarchies is used in some languages for implementing efficient dispatch algorithms.

Subscribe to: Posts ( Atom )