Skip to content

Inference

optimize(deme_graph, inference_options, data, pts, maxiter=1000, perturb=0, verbose=0, uL=None, log=True, method='fmin', fit_ancestral_misid=False, misid_guess=None, output_stream=sys.stdout, output=None, overwrite=False)

Optimize demography given a deme graph of initial and fixed parameters and inference options that specify which parameters to fit, and bounds or constraints on those parameters.

Parameters:

Name Type Description Default
deme_graph DemeGraph

A YAML file in demes format.

required
inference_options str

File with inference options See (url) for how to set this up.

required
data Spectrum

The SFS to fit, which must have pop_ids specified. Can either be a Spectrum object or the file path to the stored frequency spectrum. The populations in the SFS need to be present (with matching IDs) in the deme graph.

required
pts list[int]

List of grid points to extrapolate.

required
maxiter int

The maximum number of iterations to run optimization. Defaults to 1000. Note: maxiter does not seem to work with the Powell method! This appears to be a bug within scipy.optimize.

1000
perturb float

The perturbation amount of the initial parameters. Defaults to zero, in which case we do not perturb the initial parameters in the input demes YAML. If perturb is greater than zero, the initial parameters in the input YAML are randomly perturbed by up to perturb-fold.

0
verbose int

The frequency to print updates to output_stream if greater than 1.

0
uL float

The mutuation rate scaled by number of callable sites, so that theta is the compound parameter 4NeuL. If given, we optimize with multinom=False, and theta is determined by taking Ne as the ancestral or root population size, which can be fit. Defaults to None. If uL is not given, we use multinom=True and we likely do not want to fit the ancestral population size.

None
log bool

If True, optimize over log of the parameters.

True
method str

The optimization method. Available methods are "fmin", "powell", and "lbfgsb".

'fmin'
fit_ancestral_misid bool

If True, we fit the probability that the ancestral state of a given SNP is misidenitified, resulting in ancestral/derived labels being flipped. Note: this is only allowed with unfolded spectra. Defaults to False.

False
misid_guess float

Defaults to 0.01.

None
output_stream str or output stream

Defaults to standard output. Can be given an open file stream instead or other output stream.

stdout
output str

If given, the filename for the output best-fit model YAML.

None
overwrite bool

If True, overwrites any existing file with the same output name.

False

Returns:

Name Type Description
param_names list[str]

List of parameter names

xopt list[float]

The optimal parameters

fopt float

The optimal log-likelihood

Source code in dadi/Demes/Inference.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
def optimize(
    deme_graph,
    inference_options,
    data,
    pts,
    maxiter=1000,
    perturb=0,
    verbose=0,
    uL=None,
    log=True,
    method="fmin",
    fit_ancestral_misid=False,
    misid_guess=None,
    output_stream=sys.stdout,
    output=None,
    overwrite=False,
):
    """
    Optimize demography given a deme graph of initial and fixed parameters and
    inference options that specify which parameters to fit, and bounds or constraints
    on those parameters.

    Args:
        deme_graph (demes.DemeGraph): A YAML file in ``demes`` format.
        inference_options (str): File with inference options See (url) for how to set this up.
        data (Spectrum): The SFS to fit, which must have pop_ids specified. Can either be a Spectrum
            object or the file path to the stored frequency spectrum. The populations
            in the SFS need to be present (with matching IDs) in the deme graph.
        pts (list[int]): List of grid points to extrapolate.
        maxiter (int, optional): The maximum number of iterations to run optimization. Defaults
            to 1000. Note: maxiter does not seem to work with the Powell method! This
            appears to be a bug within scipy.optimize.
        perturb (float, optional): The perturbation amount of the initial parameters. Defaults to
            zero, in which case we do not perturb the initial parameters in the input
            demes YAML. If perturb is greater than zero, the initial parameters in the input
            YAML are randomly perturbed by up to `perturb`-fold.
        verbose (int, optional): The frequency to print updates to `output_stream` if greater than 1.
        uL (float, optional): The mutuation rate scaled by number of callable sites, so that theta
            is the compound parameter 4*Ne*uL. If given, we optimize with `multinom=False`,
            and theta is determined by taking Ne as the ancestral or root population size,
            which can be fit. Defaults to None. If uL is not given, we use `multinom=True`
            and we likely do not want to fit the ancestral population size.
        log (bool, optional): If True, optimize over log of the parameters.
        method (str, optional): The optimization method. Available methods are "fmin", "powell",
            and "lbfgsb".
        fit_ancestral_misid (bool, optional): If True, we fit the probability that the ancestral
            state of a given SNP is misidenitified, resulting in ancestral/derived
            labels being flipped. Note: this is only allowed with *unfolded* spectra.
            Defaults to False.
        misid_guess (float, optional): Defaults to 0.01.
        output_stream (str or output stream): Defaults to standard output. Can be given an open file stream
            instead or other output stream.
        output (str, optional): If given, the filename for the output best-fit model YAML.
        overwrite (bool, optional): If True, overwrites any existing file with the same output
            name.

    Returns:
        param_names (list[str]): List of parameter names
        xopt (list[float]): The optimal parameters
        fopt (float): The optimal log-likelihood
    """
    # constraints. Other arguments should be kw args in the function.

    # load file, data,
    builder = _get_demes_dict(deme_graph)
    options = _get_params_dict(inference_options)

    if isinstance(data, str):
        data = dadi.Spectrum.from_file(data)

    if data.pop_ids is None:
        raise ValueError("data SFS must specify population IDs")
    if len(data.pop_ids) != len(data.sample_sizes):
        raise ValueError("pop_ids and sample_sizes have different lengths")

    param_names, p0, lower_bound, upper_bound = _set_up_params_and_bounds(
        options, builder
    )
    constraints = _set_up_constraints(options, param_names)

    if fit_ancestral_misid:
        if data.folded:
            raise ValueError(
                "Data is folded - can only fit ancestral misid using unfolded data"
            )
        if misid_guess is None:
            misid_guess = 0.05
        param_names.append("p_misid")
        p0 = np.concatenate((p0, [misid_guess]))
        lower_bound = np.concatenate((lower_bound, [0]))
        upper_bound = np.concatenate((upper_bound, [1]))

    if not (isinstance(perturb, float) or isinstance(perturb, int)):
        raise ValueError("perturb must be a non-negative number")
    if perturb < 0:
        raise ValueError("perturb must be non-negative")
    elif perturb > 0:
        p0 = _perturb_params_constrained(
            p0, perturb, lower_bound, upper_bound, constraints
        )

    available_methods = ["fmin", "powell", "lbfgsb"]
    if method not in available_methods:
        raise ValueError(
            f"method {method} not available,  must be one of {available_methods}"
        )

    # rescale if log
    if log:
        p0 = np.log(p0) + 1
        obj_fun = _object_func_log
    else:
        obj_fun = _object_func
    print("builder:",builder)
    args = (
        data,
        pts,
        builder,
        options,
        lower_bound,
        upper_bound,
        constraints,
        verbose,
        uL,
        fit_ancestral_misid,
        output_stream,
    )

    # run optimization
    if method == "fmin":
        outputs = scipy.optimize.fmin(
            obj_fun,
            p0,
            args=args,
            disp=False,
            maxiter=maxiter,
            maxfun=maxiter,
            full_output=True,
        )
        xopt, fopt, iter, funcalls, warnflag = outputs
    elif method == "powell":
        outputs = scipy.optimize.fmin_powell(
            obj_fun,
            p0,
            args=args,
            disp=False,
            maxiter=maxiter,
            maxfun=maxiter,
            full_output=True,
        )
        xopt, fopt, direc, iter, funcalls, warnflag = outputs
    elif method == "lbfgsb":
        bounds = list(zip(np.log(lower_bound) + 1, np.log(upper_bound) + 1))
        outputs = scipy.optimize.fmin_l_bfgs_b(
            obj_fun,
            p0,
            bounds=bounds,
            epsilon=1e-2,
            args=args,
            iprint=-1,
            pgtol=1e-5,
            maxiter=maxiter,
            maxfun=maxiter,
            approx_grad=True,
        )
        xopt, fopt, info_dict = outputs

    if log:
        xopt = np.exp(xopt - 1)

    if output is not None:
        builder = _update_builder(builder, options, xopt)
        g = demes.Graph.fromdict(builder)
        if overwrite is False and os.path.isfile(output):
            output_stream.write(
                f"Did not write output YAML - {output} already exists. The model is "
                "printed below. To overwrite, set overwrite=True." + os.linesep
            )
            output_stream.write(str(g))
            moments.Misc.delayed_flush(stream=output_stream, delay=0.5)
        else:
            demes.dump(g, output)

    return param_names, xopt, fopt