Spatial Genetics

Article review by Jason Hammond and Sekson Sirisubtawee

R. A. Fisher.The wave of advance of advantageous genes , Ann. Eugenics 7:353–369, 1937.

Model

The main purpose of this paper is to try to mathematically describe how an advantageous mutation of a gene will spread linearly through a habitat. Let us define the following variables:

p = the frequency of the mutant gene
q = the frequency of the parnt allelomorph, i.e. q=(1-p)
m = the intensity of selection in favor of the mutant gene; assumed to be independent of p
k = coefficient of diffusion; assumed to be constant over the entire domain
x = the spatial variable, $x\in (-\infty ,\infty )$
t = time, measured in generations

With these definions p must satisfy the following partial differential equation along the linear habitat for

{\frac {\partial p}{\partial t}}=k{\frac {\partial ^{2}p}{\partial x^{2}}}+mpq

Waves of Stationary Form

One way to look at this model is to look for a stationary wave solution. Next we will diverge slightly from the technique used by Fisher, but will arrive at the same result. To represent a wave of stationary form advancing with velocity v, we may set:

s=x-vt

With this substitution we change the PDE into the following 2nd order ODE:

k{\frac {d^{2}p}{ds^{2}}}+v{\frac {dp}{ds}}+mpq=0

If we let ${\frac {dp}{ds}}=w$ then we have the following first oder system of ODE's:

\left[{\begin{array}{c}p'\\w'\end{array}}\right]={\begin{cases}w\\{\frac {1}{k}}(-vw-mp(1-p))\end{cases}}

Using linear Stability analysis we find there are two equilibria; (p,w)=(0,0)&(1,0), and the Jacobian matrix is given by:

J\left[{\begin{array}{c}p,w\end{array}}\right]=\left[{\begin{array}{cc}0&1\\-{\frac {m}{k}}+2{\frac {m}{k}}p&-{\frac {v}{k}}\end{array}}\right]

The eigenvalues associated with J[0,0] are

-{\frac {v}{2k}}\pm {\frac {1}{2k}}{\sqrt {v^{2}-4mk}}

and the eigenvalues associated with J[1,0] are

-{\frac {v}{2k}}\pm {\frac {1}{2k}}{\sqrt {v^{2}+4mk}}

Thus if $v^{2}>4mk$ then (0,0) is a stable equilibrium and (1,0) is an unstable saddle equilibrium, and if $v^{2}<4mk$ then (0,0) is a stable spiral equilibrium and (1,0) is an unstable saddle equilibrium. So the only situation we can realistically have is the case when $v^{2}>4mk$ , because in the other case the solutions would allow p to be negative (but $p\in [0,1]$ ). Fisher comes to this result on the bottom of page 356. Another point to notice is that when ${\frac {d^{2}p}{ds^{2}}}=0$ there is a point of inflection at which the change of p with respect to s is largest in magnitude (i.e. largest gradient). This occurs when $w={\frac {-mpq}{v}}$ . We will solve the problem numerically and see that this minimum occurs at the intersection of the solution curve and the horizontal nullcline in the phase portrait.

Example Solutions with $v^{2}>4mk$

We will now construct an example with m=k=1 and v=2 to illustrate some solutions. We used an initial value (p,w) very close to the unstable equilibrium at (1,0) to generate these plots:

Phase plane plot

Frequency plot compared with Fisher's Frequency plot on page 360.They match quite well. And we can see that we achieved the same p-value at the max gradient.

Gradient plot

The tabulation of the wave form for c=1

It is possible to get the numerical value of z for each value of p from 0 to 1, and therefore to construct the wave. The following three stages are the process for constructing the wave :

(a) An expansion of z in terms of p is obtained for the neighborhood of p =1,

(b) At any point on the curve dz/dp is calculated from the differential equation, and from this, and preceding values, then the next point on the curve is obtained.

(c) Since ${\frac {dz}{dp}}\rightarrow \infty$ as $p\rightarrow 0$ at a certain stage, a series of values of p for given z are obtained by interpolation, and the process continued using ${\frac {dp}{dz}}$ instead of ${\frac {dz}{dp}}$ .

If $z={\sqrt {2}}-1+aq+bq^{2}+cq^{3}+...$ when $q$ is small, by substitution in the differential equation, and equating power of $q$ , we find that $a=0.10582293,\,b=0.0538084,\,c=0.0341074$ . However, the fourth coefficient is numerically about 0.02417. The value of z corresponding to p = 0.96 is z = 0.41853482, then the value of ${\frac {dz}{dp}}$ can be calculated from ${\frac {dz}{dp}}={\frac {1}{pq}}\left({\frac {1}{z}}+2qz-2-z\right).$ When $pq$ is as small as 0.0384, the error in ${\frac {dz}{dp}}$ is about 170 times as big as that in z with the opposite direction. When $pq$ increases, the error in ${\frac {dz}{dp}}$ becomes less than 100 times that in z. The increment added to z to give the next value becomes sufficiently accurate.

If $D$ is the operation of differentiation, $\Delta$ is the forward differentiation, and $\nabla$ is the backward differentiation, then

\Delta z=\left\{1+{\frac {1}{2}}\nabla +{\frac {5}{12}}\nabla ^{2}+{\frac {3}{8}}\nabla ^{3}+...\right\}Dz=1+{\frac {1}{2}}\nabla (1+{\frac {5}{6}}\nabla (1+{\frac {9}{10}}\nabla (1+...,

where three differences are used.

To minimize initial errors eight places were used in the values of z for p=0.96, 0.95 and 0.94. The second differences of ${\frac {dz}{dp}}$ show the slight oscillation ( this is shown as in Table II in the original paper.) When the process is continued, ${\frac {dz}{dp}}$ and its differences increase. Then the third and the fourth differences is appreciable at about p = 0.70 and p = 0.35, respectively. The values of p corresponding with z = 0.663, 0.664, 0.665 and 9.666 are calculated, using nine digits, then the values of ${\frac {dz}{dp}}$ calculated from the differential equation.

The values from z = 0.667 (in Table III from the original paper) is found by calculating the successive differences from the differential coefficient. The last point calculated gives z=0.875, p=0.000401738, at which stage p is decreasing by more than a quarter of its value at each step.

Numerical Applications from the Paper

The total number of heterozygotes in any length of habitat in comparison with the number of organisms is $\int 2pq\,dx$ on the considered interval. Since $dx=\lambda {\frac {dp}{pqz}}$ , $\int 2pq\,dx=2\lambda \int {\frac {dp}{z}}$ . With the simple algebra, we can have ${\frac {dp}{z}}=2dp-d(pqz)$ . Consequently, the total number of heterozygotes can be expressed in the form,

2\lambda \int {\frac {dp}{z}}=2\lambda (2p-pqz).

If $p=1$ , then the total number of heterozygotes maintained at any time is $4\lambda =4{\sqrt {\left({\frac {k}{m}}\right)}}$ which is proportional to the square root of the coefficient of diffusion, k, and inversely to the square root of the intensity of selection, m.

The effective center of the wave in its advance is the point at which there are as many mutant genes in front as there are parent genes behind. The number of parent genes behind any point is $\int _{-\infty }qdx=\lambda \int ^{1}{\frac {dp}{pz}}$ .

The form $\lambda \int _{0}{\frac {dp}{qz}}$ is unsuitable, since the differential coefficient of z is infinite at $p=0$ , so we have to change such form to make it more appropriate. Since ${\frac {1}{z}}=2-{\frac {d}{dp}}(pqz)$ , $d(pqz)=pzdq+qd(pz)$ , and integration by parts, we get that

{\begin{aligned}\lambda \int _{0}{\frac {dp}{qz}}&=\lambda \left(-2lnq-\int _{0}{\frac {1}{q}}d(pqz)\right)\\&=\lambda \left(-2lnq-pz-\int _{0}pz{\frac {dq}{q}}\right)\\&=\lambda \left(-2lnq-pz-(p+lnq)z-\int ^{1}(p+lnq)dz\right).\end{aligned}}

The term $p+ln(q)$ is of the order of ${\frac {1}{2}}p^{2}$ , and is negligible within the range tabulated.

Here is a concrete situation : Suppose that a mutation giving a selective advantage of 1% is spreading along a continuously occupied shore line, that the standard displacement of young from parents in each generation is 100 yards. Then with m = 0.01/generation, k = 5000 square yards/generation, and $\lambda =717$ yards, the number of heterozygotes is equal to the population of 2868 yards(or more than a mile and a half of coast), even though it is spread over 6 or 8 miles. The rate of advance $v=2{\sqrt {mk}}$ is about 14 yards/generation(or less than 10 miles in 1000 generations.) In order to spread over a habitat of several hundred miles may take 10,000 or 100,000 generations. The effective center in this example is about 210 yards behind the 50% point, while the steepest gradient of gene ratio is about 330 yards in advance of this point.

The proportion of mutant genes is $vg=2pqmz<{\frac {1}{4}}\%$ at its highest point, where p is about 44%. If both homozygotes and heterozygotes can be distinguished with certainty, then the sampling variance of p will be ${\frac {pq}{2n}}$ , while that of the difference from two such counts will be ${\frac {pq}{n}}$ . If n is as high as 10,000, then the standard error is 0.5% when p = 0.5 and the rate of change is 0.244% in each generation, so that about 5 generations must elapse. The rate of change is greatest at the maximum value of