Spectrum_mod

Contains Spectrum object, which represents frequency spectra.

`Spectrum`

Bases: masked_array

Represents a frequency spectrum.

Spectra are represented by masked arrays. The masking allows us to ignore specific entries in the spectrum. Most often, these are the absent and fixed categories.

The constructor has the format

fs = dadi.Spectrum(data, mask, mask_corners, data_folded, check_folding, pop_ids, extrap_x) data: The frequency spectrum data mask: An optional array of the same size as data. 'True' entires in this array are masked in the Spectrum. These represent missing data categories. (For example, you may not trust your singleton SNP calling.) mask_corners: If True (default), the 'observed in none' and 'observed in all' entries of the FS will be masked. Typically these entries are unobservable, and dadi cannot reliably calculate them, so you will almost always want mask_corners=True. data_folded: If True, it is assumed that the input data is folded. An error will be raised if the input data and mask are not consistent with a folded Spectrum. check_folding: If True and data_folded=True, the data and mask will be checked to ensure they are consistent with a folded Spectrum. If they are not, a warning will be printed. pop_ids: Optional list of strings containing the population labels. extrap_x: Optional floating point value specifying x value to use for extrapolation.

`Fst()`

Wright's Fst between the populations represented in the fs.

This estimate of Fst assumes random mating, because we don't have heterozygote frequencies in the fs.

Calculation is by the method of Weir and Cockerham Evolution 38:1358 (1984). For a single SNP, the relevant formula is at the top of page 1363. To combine results between SNPs, we use the weighted average indicated by equation 10.

Source code in dadi/Spectrum_mod.py

def Fst(self):
    """
    Wright's Fst between the populations represented in the fs.

    This estimate of Fst assumes random mating, because we don't have
    heterozygote frequencies in the fs.

    Calculation is by the method of Weir and Cockerham _Evolution_ 38:1358
    (1984).  For a single SNP, the relevant formula is at the top of page
    1363. To combine results between SNPs, we use the weighted average
    indicated by equation 10.
    """
    # This gets a little obscure because we want to be able to work with
    # spectra of arbitrary dimension.

    # First quantities from page 1360
    r = self.Npop
    ns = self.sample_sizes
    nbar = numpy.mean(ns)
    nsum = numpy.sum(ns)
    nc = (nsum - numpy.sum(ns**2)/nsum)/(r-1)

    # counts_per_pop is an r+1 dimensional array, where the last axis simply
    # records the indices of the entry. 
    # For example, counts_per_pop[4,19,8] = [4,19,8]
    counts_per_pop = numpy.indices(self.shape)
    counts_per_pop = numpy.transpose(counts_per_pop, axes=list(range(1,r+1))+[0])

    # The last axis of ptwiddle is now the relative frequency of SNPs in
    # that bin in each of the populations.
    ptwiddle = 1.*counts_per_pop/ns

    # Note that pbar is of the same shape as fs...
    pbar = numpy.sum(ns*ptwiddle, axis=-1)/nsum

    # We need to use 'this_slice' to get the proper aligment between
    # ptwiddle and pbar.
    this_slice = [slice(None)]*r + [numpy.newaxis]
    s2 = numpy.sum(ns * (ptwiddle - pbar[tuple(this_slice)])**2, axis=-1)/((r-1)*nbar)

    # Note that this 'a' differs from equation 2, because we've used
    # equation 3 and b = 0 to solve for hbar.
    a = nbar/nc * (s2 - 1/(2*nbar-1) * (pbar*(1-pbar) - (r-1)/r*s2))
    d = 2*nbar/(2*nbar-1) * (pbar*(1-pbar) - (r-1)/r*s2)

    # The weighted sum over loci.
    asum = (self * a).sum()
    dsum = (self * d).sum()

    return asum/(asum+dsum)

`S()`

Segregating sites.

Source code in dadi/Spectrum_mod.py

def S(self):
    """
    Segregating sites.
    """
    oldmask = self.mask.copy()
    self.mask_corners()
    S = self.sum()
    self.mask = oldmask
    return S

`Tajima_D()`

Tajima's D.

Following Gillespie "Population Genetics: A Concise Guide" pg. 45

Source code in dadi/Spectrum_mod.py

def Tajima_D(self):
    """
    Tajima's D.

    Following Gillespie "Population Genetics: A Concise Guide" pg. 45
    """
    if not self.Npop == 1:
        raise ValueError("Only defined on a one-dimensional SFS.")

    S = self.S()

    n = 1.*self.sample_sizes[0]
    pihat = self.pi()
    theta = self.Watterson_theta()

    a1 = numpy.sum(1./numpy.arange(1,n))
    a2 = numpy.sum(1./numpy.arange(1,n)**2)
    b1 = (n+1)/(3*(n-1))
    b2 = 2*(n**2 + n + 3)/(9*n * (n-1))
    c1 = b1 - 1./a1
    c2 = b2 - (n+2)/(a1*n) + a2/a1**2

    C = numpy.sqrt((c1/a1)*S + c2/(a1**2 + a2) * S*(S-1))

    return (pihat - theta)/C

`Watterson_theta()`

Watterson's estimator of theta.

Note that is only sensible for 1-dimensional spectra.

Source code in dadi/Spectrum_mod.py

def Watterson_theta(self):
    """
    Watterson's estimator of theta.

    Note that is only sensible for 1-dimensional spectra.
    """
    if self.Npop != 1:
        raise ValueError("Only defined on a one-dimensional fs.")

    n = self.sample_sizes[0]
    S = self.S()
    an = numpy.sum(1./numpy.arange(1,n))

    return S/an

`Zengs_E()`

Zeng et al.'s E statistic.

Citation

Zeng et al. "Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants" (2006) Genetics

Source code in dadi/Spectrum_mod.py

def Zengs_E(self):
    """
    Zeng et al.'s E statistic.

    Citation:
        Zeng et al. "Statistical Tests for Detecting Positive Selection by
        Utilizing High-Frequency Variants" (2006) Genetics
    """
    num = self.theta_L() - self.Watterson_theta()

    n = self.sample_sizes[0]

    # See after Eq. 3
    an = numpy.sum(1./numpy.arange(1,n))
    # See after Eq. 9
    bn = numpy.sum(1./numpy.arange(1,n)**2)
    s = self.S()

    # See immediately after Eq. 12
    theta = self.Watterson_theta()
    theta_sq = s*(s-1.)/(an**2 + bn)

    # Eq. 14
    var = (n/(2.*(n-1.)) - 1./an) * theta\
            + (bn/an**2 + 2.*(n/(n-1.))**2 * bn - 2*(n*bn-n+1.)/((n-1.)*an)
               - (3.*n+1.)/(n-1.)) * theta_sq

    return num/numpy.sqrt(var)

`combine_pops(tocombine)`

Combine two or more populations in the fs, treating them as a single pop

Parameters:

Name	Type	Description	Default
`tocombine`	`list`	Unordered set of population numbers to combine, numbering from 1.	required

Note

The populations will alwasy be combined into the slot of the population with the smallest index. For example, if the sample sizes of the spectrum are (1,2,3,4,5) and tocombine=[4,2,1], then the output spectrum will have sample_sizes (7,3,5) when populations 1, 2, and 4 are combined.
The pop_ids of the new population will be the pop_ids of the combined populations with a '+' in between them.

Source code in dadi/Spectrum_mod.py

def combine_pops(self, tocombine):
    """
    Combine two or more populations in the fs, treating them as a single pop

    Args:
        tocombine (list): Unordered set of population numbers to combine,
            numbering from 1.

    Note:
        - The populations will alwasy be combined into the slot of the 
        population with the smallest index. For example, if the sample sizes of 
        the spectrum are (1,2,3,4,5) and tocombine=[4,2,1], then the output spectrum 
        will have sample_sizes (7,3,5) when populations 1, 2, and 4 are combined.

        - The pop_ids of the new population will be the pop_ids of the combined 
        populations with a '+' in between them.
    """
    tocombine = sorted(tocombine)
    result = self
    # Need to combine from highest to lowest index
    for right_pop in tocombine[1:][::-1]: 
        result = result.combine_two_pops([tocombine[0], right_pop])
    # Need to fix pop_id of combined pop
    if self.pop_ids:
        result.pop_ids[tocombine[0]-1] = '+'.join(self.pop_ids[_-1] for _ in tocombine)
    return result

`combine_two_pops(tocombine)`

Combine two populations in the fs, treating them as a single pop

Parameters:

Name	Type	Description	Default
`tocombine`	`list`	Indices for populations being combined (starting from 1)	required

Notes

The two populations will alwasy be combined into the slot of the population with the smallest index. For example, if the sample sizes of the spectrum are (2,3,4,5) and tocombine=[4,2], then the output spectrum will have sample_sizes (2,8,4) when populations 2 and 4 are combined.
The pop_ids of the new population will be the pop_ids of the two combined populations with a '+' in between them.

Source code in dadi/Spectrum_mod.py

def combine_two_pops(self, tocombine):
    """
    Combine two populations in the fs, treating them as a single pop

    Args:
        tocombine (list): Indices for populations being combined (starting from 1)

    Notes:
        - The two populations will alwasy be combined into the slot of the 
        population with the smallest index. For example, if the sample sizes of
        the spectrum are (2,3,4,5) and tocombine=[4,2], then the output spectrum
        will have sample_sizes (2,8,4) when populations 2 and 4 are combined.

        - The pop_ids of the new population will be the pop_ids of the two combined
        populations with a '+' in between them.
    """
    # Calculate new sample sizes
    tocombine = sorted([_-1 for _ in tocombine]) # Account for indexing from 1
    new_ns = list(self.sample_sizes)
    new_ns[tocombine[0]] = self.sample_sizes[tocombine[0]] + self.sample_sizes[tocombine[1]]
    del new_ns[tocombine[1]] # Remove pop being combined away

    # Create new pop ids
    new_pop_ids = None
    if self.pop_ids:
        new_pop_ids = list(self.pop_ids)
        new_pop_ids[tocombine[0]] = '{0}+{1}'.format(self.pop_ids[tocombine[0]], self.pop_ids[tocombine[1]])
        del new_pop_ids[tocombine[1]]

    # Create new spectrum
    new_data = np.zeros(shape=[n+1 for n in new_ns])
    new_fs = Spectrum(new_data, pop_ids=new_pop_ids)
    # Copy over extrapolation info
    new_fs.extrap_x = self.extrap_x

    # Fill new spectrum
    for index in np.ndindex(self.shape):
        new_index = list(index)
        new_index[tocombine[0]] = index[tocombine[0]] + index[tocombine[1]]
        del new_index[tocombine[1]]
        new_index = tuple(new_index)
        new_fs[new_index] += self[index]
        # Mask entry if any of the contributing entries are masked
        new_fs.mask[new_index] = (new_fs.mask[new_index] or self.mask[index])

    return new_fs

`filter_pops(tokeep, mask_corners=True)`

Filter Spectrum to keep only certain populations.

Returns new Spectrum with len(tokeep) populations.

Parameters:

Name	Type	Description	Default
`tokeep`	`int`	Unordered set of population numbers to keep, numbering from 1.	required
`mask_corners`	`bool`	If True, the typical corners of the resulting fs will be masked	`True`

Note

This is similar in practice to the marginalize operation. But here populations are numbered from 1, as in the majority of dadi.

Source code in dadi/Spectrum_mod.py

def filter_pops(self, tokeep, mask_corners=True):
    """
    Filter Spectrum to keep only certain populations.

    Returns new Spectrum with len(tokeep) populations.

    Args:
        tokeep (int): Unordered set of population numbers to keep, numbering from 1.
        mask_corners (bool, optional): If True, the typical corners of the resulting fs will be masked

    Note: 
        This is similar in practice to the marginalize operation. But here
        populations are numbered from 1, as in the majority of dadi.
    """
    toremove = list(range(0, self.ndim))
    for pop_ii in tokeep:
        # Apply -1 factor to account for indexing in marginalize
        toremove.remove(pop_ii-1)
    return self.marginalize(toremove)

`fixed_size_sample(nsamples, only_nonmasked=False)`

Generate a resampled fs from the current one.

Parameters:

Name	Type	Description	Default
`nsamples`	`int`	Number of samples to include in the new FS.	required
`only_nonmasked`	`bool`	If True, only SNPs from non-masked will be resampled. Otherwise, all SNPs will be used.	`False`

Source code in dadi/Spectrum_mod.py

def fixed_size_sample(self, nsamples, only_nonmasked=False):
    """
    Generate a resampled fs from the current one.

    Args:
        nsamples (int): Number of samples to include in the new FS.
        only_nonmasked (bool, optional): If True, only SNPs from non-masked will be resampled. 
                        Otherwise, all SNPs will be used.
    """
    flat = self.flatten()
    if only_nonmasked:
        pvals = flat.data/flat.sum()
        pvals[flat.mask] = 0
    else:
        pvals = flat.data/flat.data.sum()

    sample = numpy.random.multinomial(int(nsamples), pvals)
    sample = sample.reshape(self.shape)

    return dadi.Spectrum(sample, mask=self.mask, pop_ids=self.pop_ids)

`fold()`

Folded frequency spectrum

The folded fs assumes that information on which allele is ancestral or derived is unavailable. Thus the fs is in terms of minor allele frequency. Note that this makes the fs into a "triangular" array.

Note that if a masked cell is folded into non-masked cell, the destination cell is masked as well.

Note also that folding is not done in-place. The return value is a new Spectrum object.

Source code in dadi/Spectrum_mod.py

def fold(self):
    """
    Folded frequency spectrum

    The folded fs assumes that information on which allele is ancestral or
    derived is unavailable. Thus the fs is in terms of minor allele 
    frequency.  Note that this makes the fs into a "triangular" array.

    Note that if a masked cell is folded into non-masked cell, the
    destination cell is masked as well.

    Note also that folding is not done in-place. The return value is a new
    Spectrum object.
    """
    if self.folded:
        raise ValueError('Input Spectrum is already folded.')

    # How many samples total do we have? The folded fs can only contain
    # entries up to total_samples/2 (rounded down).
    total_samples = numpy.sum(self.sample_sizes)

    total_per_entry = self._total_per_entry()

    # Here's where we calculate which entries are nonsense in the folded fs.
    where_folded_out = total_per_entry > int(total_samples/2)

    original_mask = self.mask
    # Here we create a mask that masks any values that were masked in
    # the original fs (or folded onto by a masked value).
    final_mask = numpy.logical_or(original_mask, 
                                  reverse_array(original_mask))

    # To do the actual folding, we take those entries that would be folded
    # out, reverse the array along all axes, and add them back to the
    # original fs.
    reversed = reverse_array(numpy.where(where_folded_out, self, 0))
    folded = numpy.ma.masked_array(self.data + reversed)
    folded.data[where_folded_out] = 0

    # Deal with those entries where assignment of the minor allele is
    # ambiguous.
    where_ambiguous = (total_per_entry == total_samples/2.)
    ambiguous = numpy.where(where_ambiguous, self, 0)
    folded += -0.5*ambiguous + 0.5*reverse_array(ambiguous)

    # Mask out the remains of the folding operation.
    final_mask = numpy.logical_or(final_mask, where_folded_out)

    outfs = Spectrum(folded, mask=final_mask, data_folded=True,
                     pop_ids=self.pop_ids)
    outfs.extrap_x = self.extrap_x
    return outfs

`from_data_dict(data_dict, pop_ids, projections, mask_corners=True, polarized=True)` `staticmethod`

Spectrum from a dictionary of polymorphisms.

Parameters:

Name	Type	Description	Default
`pop_ids`	`list[str]`	list of which populations to make fs for.	required
`projections`	`list[int]`	list of sample sizes to project down to for each population.	required
`mask_corners`	`bool`	If True (default), the 'observed in none' and 'observed in all' entries of the FS will be masked.	`True`
`polarized`	`bool`	If True, the data are assumed to be correctly polarized by `outgroup_allele'. SNPs in which the 'outgroup_allele' information is missing or '-' or not concordant with the segregating alleles will be ignored. If False, any 'outgroup_allele' info present is ignored, and the returned spectrum is folded.	`True`

The data dictionary should be organized as:

{snp_id:{'segregating': ['A','T'],
         'calls': {'YRI': (23,3),
                    'CEU': (7,3)
                    },
         'outgroup_allele': 'T'
        }
}

The 'calls' entry gives the successful calls in each population, in the order that the alleles are specified in 'segregating'. Non-diallelic polymorphisms are skipped.

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_data_dict(data_dict, pop_ids, projections, mask_corners=True,
                   polarized=True):
    """
    Spectrum from a dictionary of polymorphisms.

    Args:
        pop_ids (list[str]): list of which populations to make fs for.
        projections (list[int]): list of sample sizes to project down to for each
                    population.
        mask_corners (bool, optional): If True (default), the 'observed in none' and 'observed 
                    in all' entries of the FS will be masked.
        polarized (bool, optional): If True, the data are assumed to be correctly polarized by 
                `outgroup_allele'. SNPs in which the 'outgroup_allele'
                information is missing or '-' or not concordant with the
                segregating alleles will be ignored.
                If False, any 'outgroup_allele' info present is ignored,
                and the returned spectrum is folded.

    The data dictionary should be organized as:

        {snp_id:{'segregating': ['A','T'],
                 'calls': {'YRI': (23,3),
                            'CEU': (7,3)
                            },
                 'outgroup_allele': 'T'
                }
        }
    The 'calls' entry gives the successful calls in each population, in the
    order that the alleles are specified in 'segregating'.
    Non-diallelic polymorphisms are skipped.
    """
    import dadi.Misc
    cd = dadi.Misc.count_data_dict(data_dict, pop_ids)
    fs = Spectrum._from_count_dict(cd, projections, polarized, pop_ids, 
            mask_corners=mask_corners)
    return fs

`from_data_dict_corrected(data_dict, pop_ids, projections, fux_filename, force_pos=True, mask_corners=True)` `staticmethod`

Spectrum from a dictionary of polymorphisms, corrected for ancestral misidentification.

The correction is based upon

Hernandez, Williamson & Bustamante Mol_Biol_Evol 24:1792 (2007)

Parameters:

Name	Type	Description	Default
`force_pos`	`bool`	If the correction is too agressive, it may leave some small entries in the fs less than zero. If force_pos is true, these entries will be set to zero, in such a way that the total number of segregating SNPs is conserved.	`True`
`fux_filename`	`str`	The name of the file containing the misidentification probabilities.	required

The file (fux_filename) is of the form:

# Any number of comments lines beginning with #
AAA T 0.001
AAA G 0.02
...

Where every combination of three + one bases is considered (order is not important). The triplet is the context and putatively derived allele (x) in the reference species. The single base is the base (u) in the outgroup. The numerical value is 1-f_{ux} in the notation of the paper.

The data dictionary should be organized as:

{snp_id:{'segregating': ['A','T'],
         'calls': {'YRI': (23,3),
                    'CEU': (7,3)
                    },
         'outgroup_allele': 'T',
         'context': 'CAT',
         'outgroup_context': 'CAT'
        }
}

The additional entries are 'context', which includes the two flanking bases in the species of interest, and 'outgroup_context', which includes the aligned bases in the outgroup.

This method skips entries for which the correction cannot be applied. Most commonly this is because of missing or non-constant context.

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_data_dict_corrected(data_dict, pop_ids, projections,
                             fux_filename, force_pos=True,
                             mask_corners=True):
    """
    Spectrum from a dictionary of polymorphisms, corrected for ancestral
    misidentification.

    The correction is based upon:
        Hernandez, Williamson & Bustamante _Mol_Biol_Evol_ 24:1792 (2007)

    Args:
        force_pos (bool): If the correction is too agressive, it may leave some small
                entries in the fs less than zero. If force_pos is true,
                these entries will be set to zero, in such a way that the
                total number of segregating SNPs is conserved.
        fux_filename (str): The name of the file containing the 
                misidentification probabilities.

    The file (fux_filename) is of the form:

        # Any number of comments lines beginning with #
        AAA T 0.001
        AAA G 0.02
        ...

    Where every combination of three + one bases is considered
    (order is not important).  The triplet is the context and
    putatively derived allele (x) in the reference species. The
    single base is the base (u) in the outgroup. The numerical
    value is 1-f_{ux} in the notation of the paper.

    The data dictionary should be organized as:

        {snp_id:{'segregating': ['A','T'],
                 'calls': {'YRI': (23,3),
                            'CEU': (7,3)
                            },
                 'outgroup_allele': 'T',
                 'context': 'CAT',
                 'outgroup_context': 'CAT'
                }
        }

    The additional entries are 'context', which includes the two flanking
    bases in the species of interest, and 'outgroup_context', which
    includes the aligned bases in the outgroup.

    This method skips entries for which the correction cannot be applied.
    Most commonly this is because of missing or non-constant context.
    """
    # Read the fux file into a dictionary.
    fux_dict = {}
    f = open(fux_filename)
    for line in f.readlines():
        if line.startswith('#'):
            continue
        sp = line.split()
        fux_dict[(sp[0], sp[1])] = 1-float(sp[2])
    f.close()

    # Divide the data into classes based on ('context', 'outgroup_allele')
    by_context = Spectrum._data_by_tri(data_dict)

    fs = numpy.zeros(numpy.asarray(projections)+1)
    while by_context:
        # Each time through this loop, we eliminate two entries from the 
        # data dictionary. These correspond to one class and its
        # corresponding misidentified class.
        (derived_tri, out_base), nomis_data = by_context.popitem()

        # The corresponding bases if the ancestral state had been
        # misidentifed.
        mis_out_base = derived_tri[1]
        mis_derived_tri = derived_tri[0] + out_base + derived_tri[2]
        # Get the data for that case. Note that we default to an empty
        # dictionary if we don't have data for that class.
        mis_data = by_context.pop((mis_derived_tri, mis_out_base), {})

        fux = fux_dict[(derived_tri, out_base)]
        fxu = fux_dict[(mis_derived_tri, mis_out_base)]

        # Get the spectra for these two cases
        Nux = Spectrum.from_data_dict(nomis_data, pop_ids, projections)
        Nxu = Spectrum.from_data_dict(mis_data, pop_ids, projections)

        # Equations 5 & 6 from the paper.
        Nxu_rev = reverse_array(Nxu)
        Rux = (fxu*Nux - (1-fxu)*Nxu_rev)/(fux+fxu-1)
        Rxu = reverse_array((fux*Nxu_rev - (1-fux)*Nux)/(fux+fxu-1))

        fs += Rux + Rxu

    # Here we take the negative entries, and flip them back, so they end up
    # zero and the total number of SNPs is conserved.
    if force_pos:
        negative_entries = numpy.minimum(0, fs)
        fs -= negative_entries
        fs += reverse_array(negative_entries)

    return Spectrum(fs, mask_corners=mask_corners, pop_ids=pop_ids)

`from_demes(g, sampled_demes, sample_sizes, pts, log_extrap=False, sample_times=None, Ne=None, ancestral_misid=False)` `staticmethod`

Takes a deme graph and computes the SFS. demes is a package for specifying demographic models in a user-friendly, human-readable YAML format. This function automatically parses the demographic description and returns a SFS for the specified populations and sample sizes.

This function is new in version 1.1.0. Future developments will allow for inference using demes-based demographic descriptions.

Parameters:

Name	Type	Description	Default
`g`	`str or DemeGraph`	A `demes` DemeGraph from which to compute the SFS. The DemeGraph can either be specified as a YAML file, in which case `g` is a string, or as a `DemeGraph` object.	required
`sampled_demes`	`list[str]`	A list of deme IDs to take samples from. We can repeat demes, as long as the sampling of repeated deme IDs occurs at distinct times.	required
`sample_sizes`	`list[int]`	A list of the same length as `sampled_demes`, giving the sample sizes for each sampled deme.	required
`sample_times`	`list[floats]`	If None, assumes all sampling occurs at the end of the existence of the sampled deme. If there are ancient samples, `sample_times` must be a list of same length as `sampled_demes`, giving the sampling times for each sampled deme. Sampling times are given in time units of the original deme graph, so might not necessarily be generations (e.g. if `g.time_units` is years)	`None`
`Ne`	`float`	reference population size. If none is given, we use the initial size of the root deme.	`None`

Returns:

Name	Type	Description
`fs`	`Spectrum`	A `dadi` site frequency spectrum, with dimension equal to the length of `sampled_demes`, and shape equal to `sample_sizes` plus one in each dimension, indexing the allele frequency in each deme from 0 to n[i], where i is the deme index.

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_demes(
    g, sampled_demes, sample_sizes, pts, log_extrap=False, sample_times=None, Ne=None, ancestral_misid=False,):
    """
    Takes a deme graph and computes the SFS. ``demes`` is a package for
    specifying demographic models in a user-friendly, human-readable YAML
    format. This function automatically parses the demographic description
    and returns a SFS for the specified populations and sample sizes.

    This function is new in version 1.1.0. Future developments will allow for
    inference using ``demes``-based demographic descriptions.

    Args:
        g (str or demes.DemeGraph): A ``demes`` DemeGraph from which to compute the SFS. The DemeGraph
            can either be specified as a YAML file, in which case `g` is a string,
            or as a ``DemeGraph`` object.
        sampled_demes (list[str]): A list of deme IDs to take samples from. We can repeat
            demes, as long as the sampling of repeated deme IDs occurs at distinct
            times.
        sample_sizes (list[int]): A list of the same length as ``sampled_demes``,
            giving the sample sizes for each sampled deme.
        sample_times (list[floats], optional): If None, assumes all sampling occurs at the end of the
            existence of the sampled deme. If there are
            ancient samples, ``sample_times`` must be a list of same length as
            ``sampled_demes``, giving the sampling times for each sampled
            deme. Sampling times are given in time units of the original deme graph,
            so might not necessarily be generations (e.g. if ``g.time_units`` is years)
        Ne (float, optional): reference population size. If none is given, we use the initial
            size of the root deme.

    Returns:
        fs (Spectrum): A ``dadi`` site frequency spectrum, with dimension equal to the
            length of ``sampled_demes``, and shape equal to ``sample_sizes`` plus one
            in each dimension, indexing the allele frequency in each deme from 0
            to n[i], where i is the deme index.
    """
    global _imported_demes
    if not _imported_demes:
        try:
            global demes
            global Demes
            import demes
            import dadi.Demes as Demes

            _imported_demes = True
        except ImportError:
            raise ImportError("demes is not installed, need to `pip install demes`")

    if isinstance(g, str):
        dg = demes.load(g)
    else:
        dg = g
    if ancestral_misid==False:
        func_ex = dadi.Numerics.make_extrap_func(Demes.SFS)
    if ancestral_misid==True:
        misid_func = dadi.Numerics.make_anc_state_misid_func(Demes.SFS)
        func_ex = dadi.Numerics.make_extrap_func(misid_func)


    fs = func_ex(
        dg,
        sampled_demes,
        sample_sizes,
        pts=pts,
        sample_times=sample_times,
        Ne=Ne,
    )
    return fs

`from_file(fname, mask_corners=True, return_comments=False)` `staticmethod`

Read frequency spectrum from file.

Parameters:

Name	Type	Description	Default
`fname`	`str`	String with file name to read from. If it ends in .gz, gzip compression is assumed.	required
`mask_corners`	`bool`	If True, mask the 'absent in all samples' and 'fixed in all samples' entries.	`True`
`return_comments`	`bool`	If true, the return value is (fs, comments), where comments is a list of strings containing the comments from the file (without #'s).	`False`

Note

See to_file method for details on the file format.

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_file(fname, mask_corners=True, return_comments=False):
    """
    Read frequency spectrum from file.

    Args:
        fname (str): String with file name to read from. If it ends in .gz, gzip
            compression is assumed.
        mask_corners (bool, optional): If True, mask the 'absent in all samples' and 'fixed in
                    all samples' entries.
        return_comments (bool, optional): If true, the return value is (fs, comments), where
                        comments is a list of strings containing the comments
                        from the file (without #'s).

    Note:
        See [to_file][dadi.Spectrum_mod.Spectrum.to_file] method for details on the file format.
    """
    if fname.endswith('.gz'):
        fid = gzip.open(fname, 'rb')
    else:
        fid = open(fname, 'r')

    line = fid.readline()
    # Strip out the comments
    comments = []
    while line.startswith('#'):
        comments.append(line[1:].strip())
        line = fid.readline()

    # Read the shape of the data
    shape_spl = line.split()
    if 'folded' not in shape_spl and 'unfolded' not in shape_spl:
        # This case handles the old file format
        shape = tuple([int(d) for d in shape_spl])
        folded = False
        pop_ids = None
    else:
        # This case handles the new file format
        shape,next_ii = [int(shape_spl[0])], 1
        while shape_spl[next_ii] not in ['folded', 'unfolded']:
            shape.append(int(shape_spl[next_ii]))
            next_ii += 1
        folded = (shape_spl[next_ii] == 'folded')
        # Are there population labels in the file?
        if len(shape_spl) > next_ii + 1:
            pop_ids = line.split('"')[1::2]
        else:
            pop_ids = None

    data = numpy.fromstring(fid.readline().strip(), 
                            count=numpy.prod(shape), sep=' ')
    # fromfile returns a 1-d array. Reshape it to the proper form.
    data = data.reshape(*shape)

    maskline = fid.readline().strip()
    if not maskline:
        # The old file format didn't have a line for the mask
        mask = None
    else:
        # This case handles the new file format
        mask = numpy.fromstring(maskline, 
                                count=numpy.prod(shape), sep=' ')
        mask = mask.reshape(*shape)

    fs = Spectrum(data, mask, mask_corners, data_folded=folded,
                  pop_ids=pop_ids)

    fid.close()
    if not return_comments:
        return fs
    else:
        return fs,comments

`from_ms_file(fid, average=True, mask_corners=True, return_header=False, pop_assignments=None, pop_ids=None, bootstrap_segments=1)` `staticmethod`

Read frequency spectrum from file of ms output.

Parameters:

Name	Type	Description	Default
`fid`	`str`	string with file name to read from or an open file object.	required
`average`	`bool`	If True, the returned fs is the average over the runs in the ms file. If False, the returned fs is the sum.	`True`
`mask_corners`	`bool`	If True, mask the 'absent in all samples' and 'fixed in all samples' entries.	`True`
`return_header`	`bool`	If True, the return value is (fs, (command,seeds), where command and seeds are strings containing the ms commandline and the seeds used.	`False`
`pop_assignments`	`list[int]`	If None, the assignments of samples to populations is done automatically, using the assignment in the ms command line. To manually assign populations, pass a list of the from [6,8]. This example places the first 6 samples into population 1, and the next 8 into population 2.	`None`
`pop_ids`	`list[str]`	Optional list of strings containing the population labels. If pop_ids is None, labels will be "pop0", "pop1", ...	`None`
`bootstrap_segments`	`int`	If bootstrap_segments is an integer greater than 1, the data will be broken up into that many segments based on SNP position. Instead of single FS, a list of spectra will be returned, one for each segment.	`1`

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_ms_file(fid, average=True, mask_corners=True, return_header=False,
                 pop_assignments=None, pop_ids=None, bootstrap_segments=1):
    """
    Read frequency spectrum from file of ms output.

    Args:
        fid (str): string with file name to read from or an open file object.
        average (bool, optional): If True, the returned fs is the average over the runs in the ms
                file. If False, the returned fs is the sum.
        mask_corners (bool, optional): If True, mask the 'absent in all samples' and 'fixed in
                    all samples' entries.
        return_header (bool, optional): If True, the return value is (fs, (command,seeds), where
                    command and seeds are strings containing the ms
                    commandline and the seeds used.
        pop_assignments (list[int], optional): If None, the assignments of samples to populations is
                        done automatically, using the assignment in the ms
                        command line. To manually assign populations, pass a
                        list of the from [6,8]. This example places
                        the first 6 samples into population 1, and the next 8
                        into population 2.
        pop_ids (list[str], optional): Optional list of strings containing the population labels.
                If pop_ids is None, labels will be "pop0", "pop1", ...
        bootstrap_segments (int, optional): If bootstrap_segments is an integer greater than 1,
                            the data will be broken up into that many segments
                            based on SNP position. Instead of single FS, a list
                            of spectra will be returned, one for each segment.
    """
    newfile = False
    # Try to read from fid. If we can't, assume it's something that we can
    # use to open a file.
    if not hasattr(fid, 'read'):
        newfile = True
        fid = open(fid, 'r')

    # Parse the commandline
    command = line = fid.readline()
    command_terms = line.split()

    if command_terms[0].count('ms'):
        runs = int(command_terms[2])
        try:
            pop_flag = command_terms.index('-I')
            num_pops = int(command_terms[pop_flag+1])
            pop_samples = [int(command_terms[pop_flag+ii])
                           for ii in range(2, 2+num_pops)]
        except ValueError:
            num_pops = 1
            pop_samples = [int(command_terms[1])]
    else:
        raise ValueError('Unrecognized command string: %s.' % command)

    total_samples = numpy.sum(pop_samples)
    if pop_assignments:
        num_pops = len(pop_assignments)
        pop_samples = pop_assignments

    sample_indices = numpy.cumsum([0] + pop_samples)
    bottom_l = sample_indices[:-1]
    top_l = sample_indices[1:]

    seeds = line = fid.readline()
    while not line.startswith('//'):
        line = fid.readline()

    counts = numpy.zeros(len(pop_samples), numpy.int_)
    fs_shape = numpy.asarray(pop_samples) + 1
    dimension = len(counts)

    if dimension > 1:
        bottom0 = bottom_l[0]
        top0 = top_l[0]
        bottom1 = bottom_l[1]
        top1 = top_l[1]
    if dimension > 2:
        bottom2 = bottom_l[2]
        top2 = top_l[2]
    if dimension > 3:
        bottom3 = bottom_l[3]
        top3 = top_l[3]
    if dimension > 4:
        bottom4 = bottom_l[4]
        top4 = top_l[4]
    if dimension > 5:
        bottom5 = bottom_l[5]
        top5 = top_l[5]

    all_data = [numpy.zeros(fs_shape, numpy.int_)
                for boot_ii in range(bootstrap_segments)]
    for run_ii in range(runs):
        line = fid.readline()
        segsites = int(line.split()[-1])

        if segsites == 0:
            # Special case, need to read 3 lines to stay synced.
            for _ in range(3):
                line = fid.readline()
            continue
        line = fid.readline()
        while not line.startswith('positions'):
            line = fid.readline()

        # Read SNP positions for creating bootstrap segments
        positions = [float(_) for _ in line.split()[1:]]
        # Where we should break our interval to create our bootstraps
        breakpts = numpy.linspace(0, 1, bootstrap_segments+1)
        # The indices that correspond to those breakpoints
        break_iis = numpy.searchsorted(positions, breakpts)
        # Correct for searchsorted behavior if last position is 1,
        # to ensure all SNPs are captured
        break_iis[-1] = len(positions)

        # Read the chromosomes in
        chromos = fid.read((segsites+1)*total_samples)

        # For each bootstrap segment, relevant SNPs run from start_ii:end_ii
        for boot_ii, (start_ii, end_ii) \
                in enumerate(zip(break_iis[:-1], break_iis[1:])):
            # Use the data array corresponding to this bootstrap segment
            data = all_data[boot_ii]
            for snp in range(start_ii, end_ii):
                # Slice to get all the entries that refer to a given SNP
                this_snp = chromos[snp::segsites+1]
                # Count SNPs per population, and record them.
                if dimension == 1:
                    data[this_snp.count('1')] += 1
                elif dimension == 2:
                    data[this_snp[bottom0:top0].count('1'), 
                         this_snp[bottom1:top1].count('1')] += 1
                elif dimension == 3:
                    data[this_snp[bottom0:top0].count('1'), 
                         this_snp[bottom1:top1].count('1'),
                         this_snp[bottom2:top2].count('1')] += 1
                elif dimension == 4:
                    data[this_snp[bottom0:top0].count('1'), 
                         this_snp[bottom1:top1].count('1'),
                         this_snp[bottom2:top2].count('1'),
                         this_snp[bottom3:top3].count('1')] += 1
                elif dimension == 5:
                    data[this_snp[bottom0:top0].count('1'), 
                         this_snp[bottom1:top1].count('1'),
                         this_snp[bottom2:top2].count('1'),
                         this_snp[bottom3:top3].count('1'),
                         this_snp[bottom4:top4].count('1')] += 1
                elif dimension == 6:
                    data[this_snp[bottom0:top0].count('1'), 
                         this_snp[bottom1:top1].count('1'),
                         this_snp[bottom2:top2].count('1'),
                         this_snp[bottom3:top3].count('1'),
                         this_snp[bottom4:top4].count('1'),
                         this_snp[bottom5:top5].count('1')] += 1
                else:
                    # This is noticably slower, so we special case the cases
                    # above.
                    for dim_ii in range(dimension):
                        bottom = bottom_l[dim_ii]
                        top = top_l[dim_ii]
                        counts[dim_ii] = this_snp[bottom:top].count('1')
                    data[tuple(counts)] += 1

        # Read to the next iteration
        line = fid.readline()
        line = fid.readline()

    if newfile:
        fid.close()

    all_fs = [Spectrum(data, mask_corners=mask_corners, pop_ids=pop_ids)
              for data in all_data]
    if average:
        all_fs = [fs/runs for fs in all_fs]

    # If we aren't setting up for bootstrapping, return fs, rather than a
    # list of length 1. (This ensures backward compatibility.)
    if bootstrap_segments == 1:
        all_fs = all_fs[0]

    if not return_header:
        return all_fs
    else:
        return all_fs, (command,seeds)

`from_phi(phi, ns, xxs, mask_corners=True, pop_ids=None, admix_props=None, het_ascertained=None, force_direct=False)` `staticmethod`

Compute sample Spectrum from population frequency distribution phi.

Parameters:

Name	Type	Description	Default
`phi`	`array - like`	P-dimensional population frequency distribution.	required
`ns`	`list[int]`	Sequence of P sample sizes for each population.	required
`xxs`	`tuple[int]`	Sequence of P one-dimesional grids on which phi is defined.	required
`mask_corners`	`bool`	If True, resulting FS is masked in 'absent' and 'fixed' entries.	`True`
`pop_ids`	`list[str]`	Optional list of strings containing the population labels. If pop_ids is None, labels will be "pop0", "pop1", ...	`None`
`admix_props`	`tuple[tuple]`	Admixture proportions for sampled individuals. For example, if there are two populations, and individuals from the first pop are admixed with fraction f from the second population, then admix_props=((1-f,f),(0,1)). For three populations, the no-admixture setting is admix_props=((1,0,0),(0,1,0),(0,0,1)). (Note that this option also forces direct integration, which may be less accurate than the semi-analytic method.)	`None`
`het_ascertained`	`str['xx', 'yy', or 'zz']`	If 'xx', then FS is calculated assuming that SNPs have been ascertained by being heterozygous in one individual from population 1. (This individual is not in the current sample.) If 'yy' or 'zz', it assumed that the ascertainment individual came from population 2 or 3, respectively. (Note that this option also forces direct integration, which may be less accurate than the semi-analytic method. This could be fixed if there is interest. Note also that this option cannot be used simultaneously with admix_props.)	`None`
`force_direct`	`bool`	Forces integration to use older direct integration method, rather than using analytic integration of sampling formula.	`False`

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_phi(phi, ns, xxs, mask_corners=True, 
             pop_ids=None, admix_props=None, het_ascertained=None, 
             force_direct=False):
    """
    Compute sample Spectrum from population frequency distribution phi.

    Args:
        phi (array-like): P-dimensional population frequency distribution.
        ns (list[int]): Sequence of P sample sizes for each population.
        xxs (tuple[int]): Sequence of P one-dimesional grids on which phi is defined.
        mask_corners (bool, optional): If True, resulting FS is masked in 'absent' and 'fixed'
                    entries.
        pop_ids (list[str], optional): Optional list of strings containing the population labels.
                If pop_ids is None, labels will be "pop0", "pop1", ...
        admix_props (tuple[tuple], optional): Admixture proportions for sampled individuals. For example,
                    if there are two populations, and individuals from the
                    first pop are admixed with fraction f from the second
                    population, then admix_props=((1-f,f),(0,1)). For three
                    populations, the no-admixture setting is
                    admix_props=((1,0,0),(0,1,0),(0,0,1)). 
                    (Note that this option also forces direct integration,
                    which may be less accurate than the semi-analytic
                    method.)
        het_ascertained (str['xx', 'yy', or 'zz'], optional): If 'xx', then FS is calculated assuming that SNPs have
                        been ascertained by being heterozygous in one
                        individual from population 1. (This individual is
                        *not* in the current sample.) If 'yy' or 'zz', it
                        assumed that the ascertainment individual came from
                        population 2 or 3, respectively.
                        (Note that this option also forces direct integration,
                        which may be less accurate than the semi-analytic
                        method. This could be fixed if there is interest. Note
                        also that this option cannot be used simultaneously
                        with admix_props.)
        force_direct (bool, optional): Forces integration to use older direct integration method,
                    rather than using analytic integration of sampling 
                    formula.
    """
    if admix_props and not numpy.allclose(numpy.sum(admix_props, axis=1),1):
        raise ValueError('Admixture proportions {0} must sum to 1 for all '
                         'populations.' .format(str(admix_props)))
    if not phi.ndim == len(ns) == len(xxs):
        raise ValueError('Dimensionality of phi and lengths of ns and xxs '
                         'do not all agree.')
    if het_ascertained and not het_ascertained in ['xx','yy','zz']:
        raise ValueError("If used, het_ascertained must be 'xx', 'yy', or "
                         "'zz'.")

    if admix_props and het_ascertained:
        error = """admix_props and het_ascertained options cannot be used 
        simultaneously. Instead, please use the PhiManip methods to 
        implement admixture. If this proves inappropriate for your use, 
        contact the the dadi developers, as it may be possible to support
        both options simultaneously in the future."""
        raise NotImplementedError(error)

    if phi.ndim == 1:
        if not het_ascertained and not force_direct:
            fs = Spectrum._from_phi_1D_analytic(ns[0], xxs[0], phi,
                                                mask_corners)
        else:
            fs = Spectrum._from_phi_1D_direct(ns[0], xxs[0], phi, 
                                              mask_corners, het_ascertained)
    elif phi.ndim == 2:
        if not het_ascertained and not admix_props and not force_direct:
            fs = Spectrum._from_phi_2D_linalg(ns[0], ns[1], 
                                              xxs[0], xxs[1], phi,
                                              mask_corners)
        elif not admix_props:
            fs = Spectrum._from_phi_2D_direct(ns[0], ns[1], xxs[0], xxs[1], 
                                              phi, mask_corners, 
                                              het_ascertained)
        else:
            fs = Spectrum._from_phi_2D_admix_props(ns[0], ns[1], 
                                                  xxs[0], xxs[1], 
                                                  phi, mask_corners, 
                                                  admix_props)
    elif phi.ndim == 3:
        if not het_ascertained and not admix_props and not force_direct:
            fs = Spectrum._from_phi_3D_linalg(ns[0], ns[1], ns[2], 
                                              xxs[0], xxs[1], xxs[2],
                                              phi, mask_corners)
        elif not admix_props:
            fs = Spectrum._from_phi_3D_direct(ns[0], ns[1], ns[2], 
                                              xxs[0], xxs[1], xxs[2], 
                                              phi, mask_corners, 
                                              het_ascertained)
        else:
            fs = Spectrum._from_phi_3D_admix_props(ns[0], ns[1], ns[2], 
                                                   xxs[0], xxs[1], xxs[2], 
                                                   phi, mask_corners, 
                                                   admix_props)
    elif phi.ndim == 4:
        if not het_ascertained and not admix_props and not force_direct:
            fs = Spectrum._from_phi_4D_linalg(ns[0], ns[1], ns[2], ns[3],
                                              xxs[0], xxs[1], xxs[2], xxs[3],
                                              phi, mask_corners)
        elif not admix_props:
            fs = Spectrum._from_phi_4D_direct(ns[0], ns[1], ns[2], ns[3],
                                              xxs[0], xxs[1], xxs[2], xxs[3],
                                              phi, mask_corners, 
                                              het_ascertained)
        else:
            fs = Spectrum._from_phi_4D_admix_props(ns[0], ns[1], ns[2], ns[3],
                                                   xxs[0], xxs[1], xxs[2], xxs[3],
                                                   phi, mask_corners, 
                                                   admix_props)
    elif phi.ndim == 5:
        if not het_ascertained and not admix_props and not force_direct:
            fs = Spectrum._from_phi_5D_linalg(ns[0], ns[1], ns[2], ns[3], ns[4],
                                              xxs[0], xxs[1], xxs[2], xxs[3], xxs[4],
                                              phi, mask_corners)
    else:
        raise ValueError('Only implemented for dimensions 1-5.')
    fs.pop_ids = pop_ids
    # Record value to use for extrapolation. This is the first grid point,
    # which is where new mutations are introduced. Note that extrapolation
    # will likely fail if grids differ between dimensions.
    fs.extrap_x = xxs[0][1]
    for xx in xxs[1:]:
        if not xx[1] == fs.extrap_x:
            logger.warning('Spectrum calculated from phi different grids for '
                        'different dimensions. Extrapolation may fail.')
    return fs

`from_phi_inbreeding(phi, ns, xxs, Fs, ploidys, mask_corners=True, pop_ids=None, admix_props=None, het_ascertained=None, force_direct=True)` `staticmethod`

Compute a sample Spectrum from a population frequency distribution (phi) with inbreeding.

This function calculates the frequency spectrum while accounting for inbreeding coefficients and ploidy levels for each population.

Parameters:

Name	Type	Description	Default
`phi`	`array - like`	P-dimensional population frequency distribution.	required
`ns`	`list[int]`	Sequence of P sample sizes for each population.	required
`xxs`	`list[array - like]`	Sequence of P one-dimensional grids on which phi is defined.	required
`Fs`	`list[float]`	Sequence of P inbreeding coefficients for each population.	required
`ploidys`	`list[int]`	Sequence of P ploidy levels for each population.	required
`mask_corners`	`bool`	If True, resulting FS is masked in 'absent' and 'fixed' entries.	`True`
`pop_ids`	`list[str]`	Optional list of strings containing the population labels. If None, labels will be "pop0", "pop1", etc.	`None`
`admix_props`	`list[tuple]`	Admixture proportions for sampled individuals. For example, if there are two populations and individuals from the first population are admixed with fraction f from the second population, then admix_props=((1-f, f), (0, 1)). For three populations, the no-admixture setting is admix_props=((1, 0, 0), (0, 1, 0), (0, 0, 1)). Note: This option forces direct integration, which may be less accurate than the semi-analytic method.	`None`
`het_ascertained`	`str`	If 'xx', the FS is calculated assuming that SNPs have been ascertained by being heterozygous in one individual from population 1. This individual is not in the current sample. If 'yy' or 'zz', it is assumed that the ascertainment individual came from population 2 or 3, respectively. Note: This option forces direct integration, which may be less accurate than the semi-analytic method. This option cannot be used simultaneously with admix_props.	`None`
`force_direct`	`bool`	Forces integration to use the older direct integration method, rather than using analytic integration of the sampling formula.	`True`

Returns:

Name	Type	Description
`fs`	`Spectrum`	The resulting frequency spectrum.

Raises:

Type	Description
`ValueError`	If the dimensionality of phi and lengths of ns, xxs, ploidys, and Fs do not all agree, or if invalid values are provided for het_ascertained.
`NotImplementedError`	If admix_props and het_ascertained are used simultaneously.

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_phi_inbreeding(phi, ns, xxs, Fs, ploidys, mask_corners=True, 
                        pop_ids=None, admix_props=None,
                        het_ascertained=None, force_direct=True):
    """
    Compute a sample Spectrum from a population frequency distribution (phi) with inbreeding.

    This function calculates the frequency spectrum while accounting for inbreeding
    coefficients and ploidy levels for each population.

    Args:
        phi (array-like): P-dimensional population frequency distribution.
        ns (list[int]): Sequence of P sample sizes for each population.
        xxs (list[array-like]): Sequence of P one-dimensional grids on which phi is defined.
        Fs (list[float]): Sequence of P inbreeding coefficients for each population.
        ploidys (list[int]): Sequence of P ploidy levels for each population.
        mask_corners (bool, optional): If True, resulting FS is masked in 'absent' and 'fixed' entries.
        pop_ids (list[str], optional): Optional list of strings containing the population labels.
            If None, labels will be "pop0", "pop1", etc.
        admix_props (list[tuple], optional): Admixture proportions for sampled individuals.
            For example, if there are two populations and individuals from the first
            population are admixed with fraction f from the second population, then
            admix_props=((1-f, f), (0, 1)). For three populations, the no-admixture
            setting is admix_props=((1, 0, 0), (0, 1, 0), (0, 0, 1)).
            Note: This option forces direct integration, which may be less accurate
            than the semi-analytic method.
        het_ascertained (str, optional): If 'xx', the FS is calculated assuming that SNPs have
            been ascertained by being heterozygous in one individual from population 1.
            This individual is not in the current sample. If 'yy' or 'zz', it is assumed
            that the ascertainment individual came from population 2 or 3, respectively.
            Note: This option forces direct integration, which may be less accurate
            than the semi-analytic method. This option cannot be used simultaneously
            with admix_props.
        force_direct (bool, optional): Forces integration to use the older direct integration
            method, rather than using analytic integration of the sampling formula.

    Returns:
        fs (Spectrum): The resulting frequency spectrum.

    Raises:
        ValueError: If the dimensionality of phi and lengths of ns, xxs, ploidys, and Fs
            do not all agree, or if invalid values are provided for het_ascertained.
        NotImplementedError: If admix_props and het_ascertained are used simultaneously.
    """
    if np.all(np.asarray(Fs) == 0):
        return Spectrum.from_phi(phi, ns, xxs, mask_corners=mask_corners, 
                        pop_ids=pop_ids, admix_props=admix_props,
                        het_ascertained=het_ascertained, force_direct=force_direct)
    if admix_props and not numpy.allclose(numpy.sum(admix_props, axis=1),1):
        raise ValueError('Admixture proportions {0} must sum to 1 for all '
                         'populations.' .format(str(admix_props)))
    if not phi.ndim == len(ns) == len(xxs) == len(Fs) == len(ploidys):
        raise ValueError('Dimensionality of phi and lengths of ns, xxs, ploidys, and Fs '
                         'do not all agree.')
    if het_ascertained and not het_ascertained in ['xx','yy','zz']:
        raise ValueError("If used, het_ascertained must be 'xx', 'yy', or "
                         "'zz'.")

    Fs = np.minimum(Fs, 1 - 1e-10)

    if admix_props and het_ascertained:
        error = """admix_props and het_ascertained options cannot be used 
        simultaneously. Instead, please use the PhiManip methods to 
        implement admixture. If this proves inappropriate for your use, 
        contact the the dadi developers, as it may be possible to support
        both options simultaneously in the future."""
        raise NotImplementedError(error)

    if phi.ndim == 1:
        fs = Spectrum._from_phi_1D_direct_inbreeding(ns[0], xxs[0], phi, Fs[0],
                                                     mask_corners, ploidys[0],
                                                     het_ascertained)
    elif phi.ndim == 2:
        fs = Spectrum._from_phi_2D_direct_inbreeding(ns[0], ns[1], xxs[0], xxs[1],
                                                     phi, Fs[0], Fs[1], mask_corners,
                                                     ploidys[0], ploidys[1],
                                                     het_ascertained)
    elif phi.ndim == 3:
        fs = Spectrum._from_phi_3D_direct_inbreeding(ns[0], ns[1], ns[2], 
                                                     xxs[0], xxs[1], xxs[2], 
                                                     phi, Fs[0], Fs[1], Fs[2],
                                                     mask_corners, ploidys[0],
                                                     ploidys[1], ploidys[2],
                                                     het_ascertained)
    else:
        raise ValueError('Only implemented for dimensions 1,2 or 3.')
    fs.pop_ids = pop_ids
    # Record value to use for extrapolation. This is the first grid point,
    # which is where new mutations are introduced. Note that extrapolation
    # will likely fail if grids differ between dimensions.
    fs.extrap_x = xxs[0][1]
    for xx in xxs[1:]:
        if not xx[1] == fs.extrap_x:
            logger.warning('Spectrum calculated from phi different grids for '
                        'different dimensions. Extrapolation may fail.')
    return fs

`from_sfscode_file(fid, sites='all', average=True, mask_corners=True, return_header=False, pop_ids=None)` `staticmethod`

Read a frequency spectrum from an sfs_code output file.

This function parses the output of sfs_code and returns a frequency spectrum based on the specified filtering and formatting options.

Parameters:

Name	Type	Description	Default
`fid`	`str or file object`	Path to the file or an open file object containing sfs_code output.	required
`sites`	`str`	Site type to include in the spectrum. Options are: 'all': Include all sites. 'syn': Include only synonymous mutations. 'nonsyn': Include only non-synonymous mutations.	`'all'`
`average`	`bool`	If True, return the average spectrum across runs; if False, return the sum.	`True`
`mask_corners`	`bool`	If True, mask entries corresponding to alleles absent or fixed in all samples.	`True`
`return_header`	`bool`	If True, return a tuple (fs, (command, seeds)), where `command` and `seeds` are strings containing the ms command line and seed values used.	`False`
`pop_ids`	`list of str`	Population labels. If None, defaults to ["pop0", "pop1", ...].	`None`

Returns:

Name	Type	Description
`fs`	`Spectrum`	The frequency spectrum, optionally with metadata if `return_header` is True.

Source code in dadi/Spectrum_mod.py

@staticmethod
def from_sfscode_file(fid, sites='all', average=True, mask_corners=True, 
                      return_header=False, pop_ids=None):
    """
    Read a frequency spectrum from an sfs_code output file.

    This function parses the output of sfs_code and returns a frequency spectrum
    based on the specified filtering and formatting options.

    Args:
        fid (str or file object): Path to the file or an open file object containing sfs_code output.
        sites (str): Site type to include in the spectrum. Options are:

            - 'all': Include all sites.

            - 'syn': Include only synonymous mutations.

            - 'nonsyn': Include only non-synonymous mutations.

        average (bool): If True, return the average spectrum across runs; if False, return the sum.
        mask_corners (bool): If True, mask entries corresponding to alleles absent or fixed in all samples.
        return_header (bool): If True, return a tuple (fs, (command, seeds)), where `command` and `seeds`
            are strings containing the ms command line and seed values used.
        pop_ids (list of str, optional): Population labels. If None, defaults to ["pop0", "pop1", ...].

    Returns:
        fs (Spectrum): The frequency spectrum, optionally with metadata if `return_header` is True.
    """

    newfile = False
    # Try to read from fid. If we can't, assume it's something that we can
    # use to open a file.
    if not hasattr(fid, 'read'):
        newfile = True
        fid = open(fid, 'r')

    if sites == 'all':
        only_nonsyn, only_syn = False, False
    elif sites == 'syn':
        only_nonsyn, only_syn = False, True
    elif sites == 'nonsyn':
        only_nonsyn, only_syn = True, False
    else:
        raise ValueError("'sites' argument must be one of ('all', 'syn', "
                         "'nonsyn').")

    command = fid.readline()
    command_terms = command.split()

    runs = int(command_terms[2])
    num_pops = int(command_terms[1])

    # sfs_code default is 6 individuals, and I assume diploid pop
    pop_samples = [12] *  num_pops
    if '--sampSize' in command_terms or '-n' in command_terms:
        try:
            pop_flag = command_terms.index('--sampSize')
            pop_flag = command_terms.index('-n')
        except ValueError:
            pass
        pop_samples = [2*int(command_terms[pop_flag+ii])
                       for ii in range(1, 1+num_pops)]

    pop_samples = numpy.asarray(pop_samples)
    pop_fixed_str = [',%s.-1' % i for i in range(num_pops)]
    pop_count_str = [',%s.' % i for i in range(num_pops)]

    seeds = fid.readline()
    line = fid.readline()

    data = numpy.zeros(numpy.asarray(pop_samples)+1, numpy.int_)

    # line = //iteration...
    line = fid.readline()
    for iter_ii in range(runs):
        for ii in range(5):
            line = fid.readline()

        # It is possible for a mutation to be listed several times in the
        # output.  To accomodate this, I keep a dictionary of identities
        # for those mutations, and hold off processing them until I've seen
        # all mutations listed for the iteration.
        mut_dict = {}

        # Loop until this iteration ends.
        while not line.startswith('//') and line != '':
            split_line = line.split(';')
            if split_line[-1] == '\n':
                split_line = split_line[:-1]

            # Loop over mutations on this line.
            for mut_ii, mutation in enumerate(split_line):
                counts_this_mut = numpy.zeros(num_pops, numpy.int_)

                split_mut = mutation.split(',')

                # Exclude synonymous mutations
                if only_nonsyn and split_mut[7] == '0':
                    continue
                # Exclude nonsynonymous mutations
                if only_syn and split_mut[7] == '1':
                    continue

                ind_start = len(','.join(split_mut[:12]))
                by_individual = mutation[ind_start:]

                mut_id = ','.join(split_mut[:4] + split_mut[5:11])

                # Count mutations in each population
                for pop_ii,fixed_str,count_str\
                        in zip(range(num_pops), pop_fixed_str, 
                               pop_count_str):
                    if fixed_str in by_individual:
                        counts_this_mut[pop_ii] = pop_samples[pop_ii]
                    else:
                        counts_this_mut[pop_ii] =\
                                by_individual.count(count_str)

                # Initialize the list that will track the counts for this
                # mutation. Using setdefault means that it won't overwrite
                # if there's already a list stored there.
                mut_dict.setdefault(mut_id, [0]*num_pops)
                for ii in range(num_pops):
                    if counts_this_mut[ii] > 0 and mut_dict[mut_id][ii] > 0:
                        sys.stderr.write('Contradicting counts between '
                                         'listings for mutation %s in '
                                         'population %i.' 
                                         % (mut_id, ii))
                    mut_dict[mut_id][ii] = max(counts_this_mut[ii], 
                                               mut_dict[mut_id][ii])

            line = fid.readline()

        # Now apply all the mutations with fixations that we deffered.
        for mut_id, counts in mut_dict.items():
            if numpy.any(numpy.asarray(counts) > pop_samples):
                sys.stderr.write('counts_this_mut > pop_samples: %s > '
                                 '%s\n%s\n' % (counts, pop_samples, mut_id))
                counts = numpy.minimum(counts, pop_samples)
            data[tuple(counts)] += 1

    if newfile:
        fid.close()

    fs = Spectrum(data, mask_corners=mask_corners, pop_ids=pop_ids)
    if average:
        fs /= runs

    if not return_header:
        return fs
    else:
        return fs, (command,seeds)

`log()`

Return the natural logarithm of the entries of the frequency spectrum.

Only necessary because numpy.ma.log now fails to propagate extra attributes after numpy 1.10.

Source code in dadi/Spectrum_mod.py

def log(self):
    """
    Return the natural logarithm of the entries of the frequency spectrum.

    Only necessary because numpy.ma.log now fails to propagate extra
    attributes after numpy 1.10.
    """
    logfs = numpy.ma.log(self)
    logfs.folded = self.folded
    logfs.pop_ids = self.pop_ids
    logfs.extrap_x = self.extrap_x
    return logfs

`marginalize(over, mask_corners=True)`

Reduced dimensionality spectrum summing over some populations.

Parameters:

Name	Type	Description	Default
`over`	`list of int`	sequence of axes to sum over. For example (0,2) will sum over populations 0 and 2.	required
`mask_corners`	`bool`	If True, the typical corners of the resulting fs will be masked	`True`

Source code in dadi/Spectrum_mod.py

def marginalize(self, over, mask_corners=True):
    """
    Reduced dimensionality spectrum summing over some populations.

    Args:
        over (list of int): sequence of axes to sum over. For example (0,2) will sum over
            populations 0 and 2.
        mask_corners (bool, optional): If True, the typical corners of the resulting fs will be
                    masked
    """
    original_folded = self.folded
    # If we started with an folded Spectrum, we need to unfold before
    # marginalizing.
    if original_folded:
        output = self.unfold()
    else:
        output = self.copy()

    orig_mask = output.mask.copy()
    orig_mask.flat[0] = orig_mask.flat[-1] = False
    if numpy.any(orig_mask):
        logger.warning('Marginalizing a Spectrum with internal masked values. '
                    'This may not be a well-defined operation.')

    # Do the marginalization
    for axis in sorted(over)[::-1]:
        output = output.sum(axis=axis)
    pop_ids = None
    if self.pop_ids is not None:
        pop_ids = list(self.pop_ids)
        for axis in sorted(over)[::-1]:
            del pop_ids[axis]
    output.folded = False
    output.pop_ids = pop_ids
    output.extrap_x = self.extrap_x

    if mask_corners:
        output.mask_corners()

    # Return folded or unfolded as original.
    if original_folded:
        return output.fold()
    else:
        return output

`mask_corners()`

Mask the 'seen in 0 samples' and 'seen in all samples' entries.

Source code in dadi/Spectrum_mod.py

def mask_corners(self):
    """
    Mask the 'seen in 0 samples' and 'seen in all samples' entries.
    """
    self.mask.flat[0] = self.mask.flat[-1] = True

`pi()`

Estimated expected number of pairwise differences between two chromosomes in the population.

Note that this estimate includes a factor of sample_size / (sample_size - 1) to make E( π̂ ) = θ.

Source code in dadi/Spectrum_mod.py

def pi(self):
    """
    Estimated expected number of pairwise differences between two
    chromosomes in the population.

    Note that this estimate includes a factor of sample_size / (sample_size - 1)
    to make E( π̂ ) = θ.
    """
    if self.ndim != 1:
        raise ValueError("Only defined for a one-dimensional SFS.")

    n = self.sample_sizes[0]
    # sample frequencies p 
    p = numpy.arange(0,n+1,dtype=float)/n
    # This expression derives from Gillespie's _Population_Genetics:_A
    # _Concise_Guide_, 2nd edition, section 2.6.
    return n/(n-1.) * 2*numpy.ma.sum(self*p*(1-p))

`project(ns)`

Project to smaller sample size.

Parameters:

Name	Type	Description	Default
`ns`	`int`	Sample sizes for new spectrum.	required

Returns:

Name	Type	Description
`fs`	`Spectrum`	Spectrum object with sample sizes ns.

Source code in dadi/Spectrum_mod.py

def project(self, ns):
    """
    Project to smaller sample size.

    Args:
        ns (int): Sample sizes for new spectrum.

    Returns:
        fs (Spectrum): Spectrum object with sample sizes ns.
    """
    if len(ns) != self.Npop:
        raise ValueError('Requested sample sizes not of same dimension '
                         'as spectrum. Perhaps you need to marginalize '
                         'over some populations first?')
    if numpy.any(numpy.asarray(ns) > numpy.asarray(self.sample_sizes)):
        raise ValueError('Cannot project to a sample size greater than '
                         'original. Original size is %s and requested size '
                         'is %s.' % (self.sample_sizes, ns))

    original_folded = self.folded
    # If we started with an folded Spectrum, we need to unfold before
    # projecting.
    if original_folded:
        output = self.unfold()
    else:
        output = self.copy()

    # Iterate over each axis, applying the projection.
    for axis,proj in enumerate(ns):
        if proj != self.sample_sizes[axis]:
            output = output._project_one_axis(proj, axis)

    output.pop_ids = self.pop_ids
    output.extrap_x = self.extrap_x

    # Return folded or unfolded as original.
    if original_folded:
        return output.fold()
    else:
        return output

`reorder_pops(neworder)`

Get Spectrum with populations in new order

Parameters:

Name	Type	Description	Default
`neworder`	`list[int]`	Integer list defining new order of populations, indexing the orginal populations from 1. Must contain all integers from 1 to number of pops.	required

Returns:

Name	Type	Description
`fs`	`Spectrum`	New Spectrum with same number of populations, but in a different order

Source code in dadi/Spectrum_mod.py

def reorder_pops(self, neworder):
    """
    Get Spectrum with populations in new order

    Args:
        neworder (list[int]): Integer list defining new order of populations, indexing the orginal
                populations from 1. Must contain all integers from 1 to number of pops.

    Returns:
        fs (Spectrum): New Spectrum with same number of populations, but in a different order
    """
    if sorted(neworder) != [_+1 for _ in range(self.ndim)]:
        raise(ValueError("neworder argument misspecified"))
    newaxes = [_-1 for _ in neworder]
    fs = self.transpose(newaxes)
    if self.pop_ids:
        fs.pop_ids = [self.pop_ids[_] for _ in newaxes]

    return fs

`sample()`

Generate a Poisson-sampled fs from the current one.

Note

Entries where the current fs is masked will be masked in the output sampled fs.

Source code in dadi/Spectrum_mod.py

def sample(self):
    """
    Generate a Poisson-sampled fs from the current one.

    Note:
        Entries where the current fs is masked will be masked in the
          output sampled fs.
    """
    import scipy.stats
    # These are entries where the sampling has no meaning, b/c fs is masked.
    bad_entries = self.mask
    # We convert to a 1-d array for passing into the sampler
    means = self.ravel().copy()
    # Filter out those bad entries.
    means[bad_entries.ravel()] = 1
    # Sample
    samp = scipy.stats.distributions.poisson.rvs(means, size=len(means))
    # Replace bad entries with zero
    samp[bad_entries.ravel()] = 0
    # Convert back to a properly shaped array
    samp = samp.reshape(self.shape)
    # Convert to a fs and mask the bad entries
    samp = Spectrum(samp, mask=self.mask, data_folded=self.folded,
                    pop_ids = self.pop_ids)
    return samp

`scramble_pop_ids(mask_corners=True)`

Spectrum corresponding to scrambling individuals among populations.

This is useful for assessing how diverged populations are. Essentially, it pools all the individuals represented in the fs and generates new populations of random individuals (without replacement) from that pool. If this fs is significantly different from the original, that implies population structure.

Source code in dadi/Spectrum_mod.py

def scramble_pop_ids(self, mask_corners=True):
    """
    Spectrum corresponding to scrambling individuals among populations.

    This is useful for assessing how diverged populations are.
    Essentially, it pools all the individuals represented in the fs and
    generates new populations of random individuals (without replacement)
    from that pool. If this fs is significantly different from the
    original, that implies population structure.
    """
    original_folded = self.folded
    # If we started with an folded Spectrum, we need to unfold before
    # projecting.
    if original_folded:
        self = self.unfold()

    total_samp = numpy.sum(self.sample_sizes)

    # First generate a 1d sfs for the pooled population.
    combined = numpy.zeros(total_samp+1)
    # For each entry in the fs, this is the total number of derived alleles
    total_per_entry = self._total_per_entry()
    # Sum up to generate the equivalent 1-d spectrum.
    for derived,counts in zip(total_per_entry.ravel(), self.ravel()):
        combined[derived] += counts

    # Now resample back into a n-d spectrum
    # For each entry, this is the counts per popuation. 
    #  e.g. counts_per_entry[3,4,5] = [3,4,5]
    counts_per_entry = self._counts_per_entry()
    # Reshape it to be 1-d, so we can iterate over it easily.
    counts_per_entry = counts_per_entry.reshape(numpy.prod(self.shape), 
                                                self.ndim)
    resamp = numpy.zeros(self.shape)
    for counts, derived in zip(counts_per_entry, total_per_entry.ravel()):
        # The probability here is 
        # (t1 choose d1)*(t2 choose d2)/(ntot choose derived)
        lnprob = sum(_lncomb(t,d) for t,d in zip(self.sample_sizes,counts))
        lnprob -= _lncomb(total_samp, derived)
        prob = numpy.exp(lnprob)
        # Assign result using the appropriate weighting
        resamp[tuple(counts)] += prob*combined[derived]

    resamp = Spectrum(resamp, mask_corners=mask_corners)
    if not original_folded:
        return resamp
    else:
        return resamp.fold()

`theta_L()`

theta_L as defined by Zeng et al. "Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants" (2006) Genetics

Note that is only sensible for 1-dimensional spectra.

Source code in dadi/Spectrum_mod.py

def theta_L(self):
    """
    theta_L as defined by Zeng et al. "Statistical Tests for Detecting
    Positive Selection by Utilizing High-Frequency Variants" (2006)
    Genetics

    Note that is only sensible for 1-dimensional spectra.
    """
    if self.Npop != 1:
        raise ValueError("Only defined on a one-dimensional fs.")

    n = self.sample_sizes[0]
    return numpy.sum(numpy.arange(1,n)*self[1:n])/(n-1)

`to_file(fname, precision=16, comment_lines=[], foldmaskinfo=True)`

Write frequency spectrum to file.

Parameters:

Name	Type	Description	Default
`fname`	`str`	File name to write to. If string ends in .gz, file will be saved with gzip compression.	required
`precision`	`int`	precision with which to write out entries of the SFS. (They are formated via `%.<p>g`, where `<p>` is the precision.)	`16`
`comment_lines`	`list[str]`	list of strings to be used as comment lines in the header of the output file.	`[]`
`foldmaskinfo`	`bool`	If False, folding and mask and population label information will not be saved. This conforms to the file format for dadi versions prior to 1.3.0.	`True`

The file format is:

- # Any number of comment lines beginning with a '#'

- A single line containing N integers giving the dimensions of the fs
  array. So this line would be '5 5 3' for an SFS that was 5x5x3.
  (That would be 4x4x2 *samples*.)

- On the *same line*, the string 'folded' or 'unfolded' denoting the
  folding status of the array

- On the *same line*, optional strings each containing the population
  labels in quotes separated by spaces, e.g. "pop 1" "pop 2"

- A single line giving the array elements. The order of elements is 
  e.g.: fs[0,0,0] fs[0,0,1] fs[0,0,2] ... fs[0,1,0] fs[0,1,1] ...

- A single line giving the elements of the mask in the same order as
  the data line. '1' indicates masked, '0' indicates unmasked.

Source code in dadi/Spectrum_mod.py

def to_file(self, fname, precision=16, comment_lines = [], 
            foldmaskinfo=True):
    """
    Write frequency spectrum to file.

    Args:
        fname (str): File name to write to.  If string ends in .gz, file will be saved
            with gzip compression.
        precision (int, optional): precision with which to write out entries of the SFS. (They
                are formated via `%.<p>g`, where `<p>` is the precision.)
        comment_lines (list[str], optional): list of strings to be used as comment lines in the header
                    of the output file.
        foldmaskinfo (bool, optional): If False, folding and mask and population label
                    information will not be saved. This conforms to the file
                    format for dadi versions prior to 1.3.0.

    The file format is:

        - # Any number of comment lines beginning with a '#'

        - A single line containing N integers giving the dimensions of the fs
          array. So this line would be '5 5 3' for an SFS that was 5x5x3.
          (That would be 4x4x2 *samples*.)

        - On the *same line*, the string 'folded' or 'unfolded' denoting the
          folding status of the array

        - On the *same line*, optional strings each containing the population
          labels in quotes separated by spaces, e.g. "pop 1" "pop 2"

        - A single line giving the array elements. The order of elements is 
          e.g.: fs[0,0,0] fs[0,0,1] fs[0,0,2] ... fs[0,1,0] fs[0,1,1] ...

        - A single line giving the elements of the mask in the same order as
          the data line. '1' indicates masked, '0' indicates unmasked.
    """
    # Open the file object.
    if fname.endswith('.gz'):
        fid = gzip.open(fname, 'wb')
    else:
        fid = open(fname, 'w')

    # Write comments
    for line in comment_lines:
        fid.write('# ')
        fid.write(line.strip())
        fid.write('\n')

    # Write out the shape of the fs
    for elem in self.data.shape:
        fid.write('%i ' % elem)

    if foldmaskinfo:
        if not self.folded:
            fid.write('unfolded')
        else:
            fid.write('folded')
        if self.pop_ids is not None:
            for label in self.pop_ids:
                fid.write(' "%s"' % label)

    fid.write('\n')

    # Write the data to the file. The obnoxious ravel call is to
    # ensure compatibility with old version that used self.data.tofile.
    numpy.savetxt(fid, [self.data.ravel()], delimiter=' ',
                  fmt='%%.%ig' % precision)

    if foldmaskinfo:
        # Write the mask to the file
        numpy.savetxt(fid, [numpy.asarray(self.mask, int).ravel()],
                      delimiter=' ', fmt='%d')

    fid.close()

`unfold()`

Unfolded frequency spectrum

It is assumed that each state of a SNP is equally likely to be ancestral.

Note also that unfolding is not done in-place. The return value is a new Spectrum object.

Source code in dadi/Spectrum_mod.py

def unfold(self):
    """
    Unfolded frequency spectrum

    It is assumed that each state of a SNP is equally likely to be
    ancestral.

    Note also that unfolding is not done in-place. The return value is a new
    Spectrum object.
    """
    if not self.folded:
        raise ValueError('Input Spectrum is not folded.')

    # Unfolding the data is easy.
    reversed_data = reverse_array(self.data)
    newdata = (self.data + reversed_data)/2.

    # Unfolding the mask is trickier. We want to preserve masking of entries
    # that were masked in the original Spectrum.
    # Which entries in the original Spectrum were masked solely because
    # they are incompatible with a folded Spectrum?
    total_samples = numpy.sum(self.sample_sizes)
    total_per_entry = self._total_per_entry()
    where_folded_out = total_per_entry > int(total_samples/2)

    newmask = numpy.logical_xor(self.mask, where_folded_out)
    newmask = numpy.logical_or(newmask, reverse_array(newmask))

    outfs = Spectrum(newdata, mask=newmask, data_folded=False, 
                     pop_ids=self.pop_ids)
    outfs.extrap_x = self.extrap_x
    return outfs

`unmask_all()`

Unmask all values.

Source code in dadi/Spectrum_mod.py

def unmask_all(self):
    """
    Unmask all values.
    """
    self.mask[[slice(None)]*self.Npop] = False

Spectrum_mod

Spectrum

Fst()

S()

Tajima_D()

Watterson_theta()

Zengs_E()

combine_pops(tocombine)

combine_two_pops(tocombine)

filter_pops(tokeep, mask_corners=True)

fixed_size_sample(nsamples, only_nonmasked=False)

fold()

from_data_dict(data_dict, pop_ids, projections, mask_corners=True, polarized=True) staticmethod

from_data_dict_corrected(data_dict, pop_ids, projections, fux_filename, force_pos=True, mask_corners=True) staticmethod

from_demes(g, sampled_demes, sample_sizes, pts, log_extrap=False, sample_times=None, Ne=None, ancestral_misid=False) staticmethod

from_file(fname, mask_corners=True, return_comments=False) staticmethod

from_ms_file(fid, average=True, mask_corners=True, return_header=False, pop_assignments=None, pop_ids=None, bootstrap_segments=1) staticmethod

from_phi(phi, ns, xxs, mask_corners=True, pop_ids=None, admix_props=None, het_ascertained=None, force_direct=False) staticmethod

from_phi_inbreeding(phi, ns, xxs, Fs, ploidys, mask_corners=True, pop_ids=None, admix_props=None, het_ascertained=None, force_direct=True) staticmethod

from_sfscode_file(fid, sites='all', average=True, mask_corners=True, return_header=False, pop_ids=None) staticmethod

log()

marginalize(over, mask_corners=True)

mask_corners()

pi()

project(ns)

reorder_pops(neworder)

sample()

scramble_pop_ids(mask_corners=True)

theta_L()

to_file(fname, precision=16, comment_lines=[], foldmaskinfo=True)

unfold()

unmask_all()

`Spectrum`

`Fst()`

`S()`

`Tajima_D()`

`Watterson_theta()`

`Zengs_E()`

`combine_pops(tocombine)`

`combine_two_pops(tocombine)`

`filter_pops(tokeep, mask_corners=True)`

`fixed_size_sample(nsamples, only_nonmasked=False)`

`fold()`

`from_data_dict(data_dict, pop_ids, projections, mask_corners=True, polarized=True)` `staticmethod`

`from_data_dict_corrected(data_dict, pop_ids, projections, fux_filename, force_pos=True, mask_corners=True)` `staticmethod`

`from_demes(g, sampled_demes, sample_sizes, pts, log_extrap=False, sample_times=None, Ne=None, ancestral_misid=False)` `staticmethod`

`from_file(fname, mask_corners=True, return_comments=False)` `staticmethod`

`from_ms_file(fid, average=True, mask_corners=True, return_header=False, pop_assignments=None, pop_ids=None, bootstrap_segments=1)` `staticmethod`

`from_phi(phi, ns, xxs, mask_corners=True, pop_ids=None, admix_props=None, het_ascertained=None, force_direct=False)` `staticmethod`

`from_phi_inbreeding(phi, ns, xxs, Fs, ploidys, mask_corners=True, pop_ids=None, admix_props=None, het_ascertained=None, force_direct=True)` `staticmethod`

`from_sfscode_file(fid, sites='all', average=True, mask_corners=True, return_header=False, pop_ids=None)` `staticmethod`

`log()`

`marginalize(over, mask_corners=True)`

`mask_corners()`

`pi()`

`project(ns)`

`reorder_pops(neworder)`

`sample()`

`scramble_pop_ids(mask_corners=True)`

`theta_L()`

`to_file(fname, precision=16, comment_lines=[], foldmaskinfo=True)`

`unfold()`

`unmask_all()`