Specifying a model

A demographic model specifies population sizes and migration rates as function of time, and it also includes discrete events such as population splittings and admixture. Unlike many coalescent-based simulators, demographic models in dadi are specified forward in time. Also note that all population sizes within a demographic model are specified relative to some reference population size \(N_{\text{ref}}\).

One important subtlety is that within the demographic model function, by default the mutation parameter \(\theta = 4N_{\text{ref}}\mu\) is set to 1. This is because the optimal \(\theta\) for a given model and set of data is trivial to calculate, so dadi by default does this automatically in optimization (so-called "multinomial" optimization). See the Fixed \(\theta\) section for how to fix theta to a particular value in a demographic model.

Implementation

Demographic models are specified by defining a Python function. This function employs various methods defined by dadi to specify the demography.

When defining a demographic function the arguments must be specified in a particular order. The first argument must be a list of free parameters that will be optimized. The second argument (usually called ns) must be a list of sample sizes. The last argument (usually called pts) must be the number of grid points used in the calculation. Any additional arguments (between the second and last) can be used to pass additional non-optimized parameters, using the func_args argument of the optimization methods. (See Listing 8 for an example.) The demographic model function tracks the evolution of \(\phi\) the density of mutations within the populations at given frequencies. This continuous density \(\phi\) is approximated by its values on a grid of points, represented by the numpy array phi. Thus the first step in a demographic model is to specify that grid:

xx = dadi.Numerics.default_grid(pts)

Here pts is the number of grid points in each dimension for representing \(\phi\).

All demographic models employed in dadi must begin with an equilibrium population of non-zero size. \(\phi\) for such a population can be generated using the method PhiManip.phi_1D. The most important parameter to this method is nu, which specifies the relative size of this ancestral population to the reference population. Most often, the reference population is the ancestral, so nu defaults to 1.

Once we've created an initial \(\phi\), we can begin to manipulate it. First, we can split \(\phi\) to simulate population splits. This can be done using the methods PhiManip.phi_1D_to_2D, PhiManip.phi_2D_to_3D_split_1, and PhiManip.phi_2D_to_3D_split_2. These methods take in an input \(\phi\) of either one or two dimensions, and output a \(\phi\) of one greater dimension, corresponding to addition of a population. The added population is the last dimension of \(\phi\). For example, if PhiManip.phi_2D_to_3D_split_1 is used, population 1 will split into populations 1 and 3. phi_2D_to_3D_admix is a more advanced version of the 2D_to_3D methods that incorporates admixture. In this method, the proportions of pop 3 that are derived from pop 1 and pop 2 may be specified.

Direct admixture events can be specified using the methods phi_2D_admix_1_into_2, phi_2D_admix_2_into_1, phi_3D_admix_1_and_2_into_3, phi_3D_admix_1_and_3_into_2, and phi_3D_admix_2_and_3_into_1. These methods do not change the dimensionality of \(\phi\), but rather simulate discrete admixture events. For example, phi_2D_admix_1_into_2 can be used to simulate a large discrete influx of individuals from pop 1 into pop 2. For example, this might model European (pop 1) admixture into indigenous Americans (pop 2). Note that the PhiManip methods for admixture can compromise the effectiveness of extrapolation for evaluating entries in the frequency spectrum corresponding to SNPs private to the recipient population. If your model involves admixture, you may obtain better accuracy by avoiding extrapolation and instead setting pts_l to be a list of length 1. Alternatively, if the admixture is the final event in your model, you can model admixture using the admix_props arguments for Spectrum.from_phi.

Along with these discrete manipulations of \(\phi\), we have the continuous transformations as time passes, due to genetic drift at different population sizes or migration. This is handled by Integration methods, Integration.one_pop, Integration.two_pops, and Integration.three_pops. Each of these methods must be used with a phi of the appropriate dimensionality. Integration.one_pop takes two crucial parameters, T and nu. T specifies the time of this integration and nu specifies the size of this population relative to the reference during this time period. Integration.two_pop takes an integration time T, relative sizes for populations 1 and 2 nu1 and nu2, and migration parameters m12 and m21. The migration parameter m12 specifies the rate of migration from pop2 into pop1. It is equal to the fraction of individuals each generation in pop 1 that are new migrants from pop 2, times the \(2N_{\text{ref}}\). Integration.three_pops is a straightforward extension of two_pops but now there are three population sizes and six migration parameters.

Note that for all these methods, the integration time T must be positive. To ensure this, it is best to define your time parameters as the interval between events rather than the absolute time of those events. For example, a size change happened a time Tsize before a population split Tsplit in the past.

Importantly, population sizes and migration rates (and selection coefficents) may be functions of time. This allows one to simulate exponential growth and other more complex scenarios. To do so, simply pass a function that takes a single argument (the time) and returns the given variable. The Python lambda expression is a convenient way to do this. For example, to simulate a single population growing exponentially from size nu0 to size nuF over a time T, one can do:

nu_func = lambda t: nu0 * (nuF/nu0) ** (t/T)
phi = Integration.one_pop(nu = nu_func, T = T)

Numerous examples are provided in Listings 2 through 8.

Units

The units dadi uses are slightly different than those used by some other programs, ms in particular.

In dadi, \(\theta = 4N_{\text{ref}}\mu\), as is typical.

Times are given in units of \(2N_{\text{ref}}\) generations. This differs from ms, where time is in units of \(4N_{\text{ref}}\) generations. So to convert from a time in dadi to a time in ms, divide by 2.

Migration rates are given in units of \(M_{ij} = 2N_{\text{ref}}m_{ij}\). Again, this differs from ms, where the scaling factor is \(4N_{\text{ref}}\) generations. So to get equivalent migration (\(m_{ij}\) in ms for a given rate in dadi, multiply by 2.

Fixed \(\theta\)

If you wish to set a fixed value of \(\theta = 4N_0\mu\) in your analysis, that information must be provided to the initial \(\phi\) creation function and the Integration functions. For an example, see Listing 7, which defines a demographic model in which \(\theta\) is fixed to be 137 for derived population 1. Derived pop 1 is thus the reference population for specifying all population sizes, so its size is set to 1 in the call to Integration.two_pops. When fixing \(\theta\), every Integration function must be told what the reference \(\theta\) is, using the option theta0. In addition, the methods for creating an initial \(\phi\) distribution must be passed the appropriate value of \(\theta\) using the theta0 option.

Ancient sequences

If you have DNA samples from multiple timepoints, you can construct a frequency spectrum in which different axes correspond to samples from different timepoints. To support this in dadi, the Integration.one_pop, two_pops, and three_pops support freeze arguments. When True, these arguments will "freeze" a particular population so that it no longer changes (although the relationship between SNPs in the frozen and unfrozen populations will change). Note that because time in dadi models is in genetic units, you need to be careful in how you specify the time of collection of your frozen sample. In this case, you likely want to run a model that explicitly includes \(\theta\) as parameter (See the Fixed \(\theta\) section), so that you can convert from physical to genetic units within the model function.

Code examples

def bottleneck(params, ns, pts):
    nuB, nuF, TB, TF = params
    xx = Numerics.default_grid(pts)

    phi = PhiManip.phi_1D(xx)
    phi = Integration.one_pop(phi, xx, TB, nuB)
    phi = Integration.one_pop(phi, xx, TF, nuF)

    fs = Spectrum.from_phi(phi, ns, (xx,))
    return fs

Listing 2 Bottleneck: At time TF + TB in the past, an equilibrium population goes through a bottleneck of depth nuB, recovering to relative size nuF

def growth(params, ns, pts):
    nu, T = params
    xx = Numerics.default_grid(pts)
    phi = PhiManip.phi_1D(xx)

    nu_func = lambda t : numpy.exp(numpy.log(nu) * t/T)
    phi = Integration.one_pop(phi, xx, T, nu_func)

    fs = Spectrum.from_phi(phi, ns, (xx,))
    return fs

Listing 3 Exponential growth: At time T in the past, an equilibrium population begins growing exponentially, reaching size nu at present

def split_mig(params, ns, pts):
    nu1, nu2, T, m = params
    xx = Numerics.default_grid(pts)

    phi = PhiManip.phi_1D(xx)
    phi = PhiManip.phi_1D_to_2D(xx, phi)
    phi = Integration.two_pops(phi, xx, T, nu1, nu2, m12 = m, m21 = m)

    fs = Spectrum.from_phi(phi, ns, (xx, xx))
    return fs

Listing 4 Split with migration: At time T in the past, two populations diverge from an equilibrium population, with relative sizes nu1 and nu2 and with symmetric migration at rate m

def IM(params, ns, pts):
    s, nu1, nu2, T, m12, m21 = params
    xx = Numerics.default_grid(pts)

    phi = PhiManip.phi_1D(xx)
    phi = PhiManip.phi_1D_to_2D(xx, phi)

    nu1_func = lambda t : s * (nu1/s) ** (t/T)
    nu2_func = lambda t : (1-s) * (nu2/(1-s)) ** (t/T)

    phi = Integration.two_pops(phi, xx, T, nu1_func, nu2_func, m12 = m12, m21 = m21)

    fs = Spectrum.from_phi(phi, ns, (xx, xx))
    return fs

Listing 5 Two-population isolation-with-migration: The ancestral population splits into two, with a fraction s going into pop 1 and fraction 1-s into pop 2. The populations then grow exponentially, with asymmetric migration allowed between them

from dadi import Numerics, PhiManip, Integration, Spectrum

def OutOfAfrica(params, ns, pts):
    nuAf, nuB, nuEu0, nuEu, nuAs0, nuAs, mAfB, mAfEu, mAfAs, mEuAs, TAf, TB, TEuAs = params
    xx = Numerics.default_grid(pts)

    phi = PhiManip.phi_1D(xx)
    phi = Integration.one_pop(phi, xx, TAf, nu = nuAf)

    phi = PhiManip.phi_1D_to_2D(xx, phi)
    phi = Integration.two_pops(phi, xx, TB, nu1 = nuAf, nu2 = nuB, m12 = mAfB, m21 = mAfB)

    phi = PhiManip.phi_2D_to_3D_split_2(xx, phi)

    nuEu_func = lambda t : nuEu0 * (nuEu/nuEu0) ** (t/TEuAs)
    nuAs_func = lambda t : nuAs0 * (nuAs/nuAs0) ** (t/TEuAs)
    phi = Integration.three_pops(phi, xx, TEuAs, nu1 = nuAf, nu2 = nuEu_func, nu3 = nuAs_func, m12 = mAfEu, m13 = mAfAs, m21 = mAfEu, m23 = mEuAs, m31 = mAfAs, m32 = mEuAs)

    fs = Spectrum.from_phi(phi, ns, (xx, xx, xx))
    return fs

Listing 6 Out-of-Africa model from Gutenkust (2009): This model involves a size change in the ancestral population, a split, another split, and then exponential growth of populations 1 and 2. (The from dadi import line imports those modules from the dadi namespace into the local namespace, so we don't have to type dadi. to access them.)

def fixed_theta(params, ns, pts):
    nuA, nu2, T = params
    theta1 = 137

    xx = dadi.Numerics.default_grid(pts)

    phi = dadi.PhiManip.phi_1D(xx, nu = nuA, theta0 = theta1)
    phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)
    phi = dadi.Integration.two_pops(phi, xx, T, nu1 = 1, nu2 = nu2, theta0 = theta1)

    fs = dadi.Spectrum.from_phi(phi, ns, (xx, xx))
    return fs

Listing 7 Fixed θ: A split demographic model function with a fixed value of θ = 137 for derived population 1. The free parameters are the sizes of the ancestral pop, nuA, and derived pop 2, nu2, (relative to derived pop 1), along with the divergence time T between the two derived pops

from dadi import Numerics, PhiManip, Integration, Spectrum

def NewWorld(params, ns, fixed_params, pts):
    nuEu0, nuEu, nuAs0, nuAs, nuMx0, nuMx, mEuAs, TEuAs, TMx, fEuMx = params
    theta0, nuAf, nuB, mAfB, mAfEu, mAfAs, TAf, TB = fixed_params
    xx = Numerics.default_grid(pts)

    phi = PhiManip.phi_1D(xx)
    phi = Integration.one_pop(phi, xx, TAf, nu = nuAf)

    phi = PhiManip.phi_1D_to_2D(xx, phi)
    phi = Integration.two_pops(phi, xx, TB, nu1 = nuAf, nu2 = nuB, m12 = mAfB, m21 = mAfB)

    # Integrate out the YRI population
    phi = Numerics.trapz(phi, xx, axis = 0)

    phi = PhiManip.phi_1D_to_2D(xx, phi)
    nuEu_func = lambda t : nuEu0 * (nuEu/nuEu0) ** (t/(TEuAs+TMx))
    nuAs_func = lambda t : nuAs0 * (nuAs/nuAs0) ** (t/(TEuAs+TMx))
    phi = Integration.two_pops(phi, xx, TEuAs, nu1 = nuEu_func, nu2 = nuAs_func, m12 = mEuAs, m21 = mEuAs)

    phi = PhiManip.phi_2D_to_3D_split_2(xx, phi)

    # Initial population sizes for this stretch of integration
    nuEu0 = nuEu_func(TEuAs)
    nuAs0 = nuAs_func(TEuAs)
    nuEu_func = lambda t : nuEu0 * (nuEu/nuEu0) ** (t/TMx)
    nuAs_func = lambda t : nuAs0 * (nuAs/nuAs0) ** (t/TMx)
    nuMx_func = lambda t : nuMx0 * (nuMx/nuMx0) ** (t/TMx)
    phi = Integration.three_pops(phi, xx, TMx, nu1 = nuEu_func, nu2 = nuAs_func, nu3 = nuMx_func, m12 = mEuAs, m21 = mEuAs, m23 = mAsMx, m32 = mAsMx)
    phi = PhiManip.phi_3D_admix_1_and_2_into_3(phi, fEuMx, 0, xx, xx, xx)

    fs = Sepctrum.from_phi(phi, ns, (xx, xx, xx))
    # Apply our theta0. (All previous methods default to theta0 = 1.)
    return theta0 * fs

Listing 8 Settlement-of-New-World model from Gutenkunst (2009): Because dadi is limited to 3 simultaneous populations, we need to integrate out the African population, using Numerics.trapz. This model also employs a fixed θ, and ancillary parameters passed in using the third argument