After estimating via simulation the kinship distribution K(n) of our simple population model, let us try to solve the problem analytically. As was stated before, K(n) = 0 for n odd, so we will concentrate only on even values of n. First we will calculate K(n) recursively by solving the cases K(0), K(2) and the inductive step K(n + 2). In what follows N is the (constant) size of any generation and x and y denote arbitrary individuals of the same generation. We also define
s(n) := n·r(n)/∑m m·r(m),
S := ∑ n·s(n),
D(n) := 1 − (K(0) + K(2) + ··· + K(n)) = P(k(x,y) > n).
s(n) expresses the probability that an individual have n siblings including herself, as proved in a prior entry.
K(0): this value is simply the probability that x and y are the same individual:
K(0) = P(x = y) = 1/N.
K(2) = P(x and y are siblings) =
= ∑n > 0 P(x and her siblings add up to n) P(y is one of x's siblings) =
= ∑n > 0 s(n)(n − 1)/N = (S − 1)/N.
P(k(x,y) = n + 2) = P(k(x,y) = n + 2 | k(x,y) > n) · P(k(x,y) > n) =
= P(k(x,y) = n + 2 | k(x,y) > n) · D(n),
where we have just conditioned the event "k(x,y) = n + 2" to the enclosing event "k(x,y) > n".
Now, from the very definition of the kinship proximity function, k(x,y) > n if and only if
k(m(x),m(y)) > n − 2,
k(m(x),f(y)) > n − 2,
k(f(x),m(y)) > n − 2,
k(f(x),f(y)) > n − 2;
so
P(k(x,y) = n + 2 | k(x,y) > n) =
= P([k(m(x),m(y)) = n | k(m(x),m(y)) > n − 2] or
[k(m(x),f(y)) = n | k(m(x),f(y)) > n − 2] or
[k(f(x),m(y)) = n | k(f(x),m(y)) > n − 2] or
[k(f(x),f(y)) = n | k(f(x),f(y)) > n − 2]).
Without further justification (though the final results are consistent), we assume that these clauses are statistically independent:
P(k(x,y) = n + 2 | k(x,y) > n)=
= 1 − P([k(m(x),m(y)) ≠ n | k(m(x),m(y)) > n − 2] and
[k(m(x),f(y)) ≠ n | k(m(x),f(y)) > n − 2] and
[k(f(x),m(y)) ≠ n | k(f(x),m(y)) > n − 2] and
[k(f(x),f(y)) ≠ n | k(f(x),f(y)) > n − 2]) =
= 1 − P(k(m(x),m(y)) ≠ n | k(m(x),m(y)) > n − 2) ·
· P(k(m(x),f(y)) ≠ n | k(m(x),f(y)) > n − 2) ·
· P(k(f(x),m(y)) ≠ n | k(f(x),m(y)) > n − 2) ·
· P(k(f(x),f(y)) ≠ n | k(f(x),f(y)) > n − 2).
As kinship proximity is not affected by the individuals' sex (beyond level 0), the four factors have the same value and the expression is equivalent to:
P(k(x,y) = n + 2 | k(x,y) > n)=
1 − (1 − P(k(x',y') = n | k(x',y') > n − 2))4,
where x' and y' are arbitrary individuals of the generation preceding x and y. Using basic probability properties we have
P(k(x',y') = n | k(x',y') > n − 2) =
P(k(x',y') = n and k(x',y') > n − 2)/P(k(x',y') > n − 2) =
P(k(x',y') = n)/P(k(x',y') > n − 2) =
K(n)/D(n − 2),
and thus
K(n + 2) = D(n)(1 − (1 − K(n)/D(n − 2))4) =
= D(n)(1 − ((D(n − 2) − K(n))/D(n − 2))4) =
= D(n)(1 − (D(n)/D(n − 2))4)=
= D(n) − D(n)5/D(n − 2)4.
To summarize, the recursive calculation of K(n) can be performed as follows:
- K(0) ← 1/N
- D(0) ← (N − 1)/N
- K(2) ← (S − 1)/N
- D(2) ← (N − S)/N
- K(n + 2) ← D(n) − D(n)5/D(n − 2)4
- D(n + 2) ← D(n) − K(n + 2)
We can also provide a closed formula for K(n). The recursive equation
K(n + 2) = D(n) − D(n)5/D(n − 2)4
can be rewritten as
D(n) − D(n + 2) = D(n) − D(n)5/D(n − 2)4
or
D(n + 2) = D(n)5/D(n − 2)4.
If we define
d(n) := ln D(2n)
the equality can be expressed then as
d(n + 2) = 5d(n + 1) − 4d(n),
which is a standard second-order linear recurrence equation. The roots of the associated characteristic polynomial are 1 and 4, so the classical theory of linear recurrence equations tells us that d(n) can be expressed as
d(n) = a + b·4n,
D(n) = A·B2n,
K(0) = 1/N,
K(n) = A · (B2n − 2 − B2n), n ≥ 2,
A = (N − 1)4/3/(N·(N − S)1/3),
B = ((N − S)/(N − 1))1/3,
which is the closed expression we were after. This formula matches perfectly the experimental results we had previously obtain via simulation. A further confirmation that the analysis is consistent comes from the fact that ∑ K(n) = 1, as it must be for a probability distribution: ∑ K(n) = 1 − limn → ∞D(n), and D(n) = A·B2n tends to 0 when n → ∞ because B = ((N − S)/(N − 1))1/3 < 1 (S ≥ R and R = 2 by hypothesis).