Selecting models for phylogenetic analysis
We use the piq_model_finder app to rank models. This is the python binding to the IQ-TREE ModelFinder tool.
In [1]:
Copied!
from cogent3 import app_help, get_app, load_aligned_seqs
from piqtree import download_dataset
aln_path = download_dataset("example.phy.gz", dest_dir="data")
aln = load_aligned_seqs(aln_path, moltype="dna", format_name="phylip")
from cogent3 import app_help, get_app, load_aligned_seqs
from piqtree import download_dataset
aln_path = download_dataset("example.phy.gz", dest_dir="data")
aln = load_aligned_seqs(aln_path, moltype="dna", format_name="phylip")
Get help and then apply piq_model_finder.
In [2]:
Copied!
app_help("piq_model_finder")
app_help("piq_model_finder")
Overview
--------
Find the models of best fit for an alignment using ModelFinder.
Options for making the app
--------------------------
piq_model_finder_app = get_app('piq_model_finder', *args, **kwargs)
Parameters
----------
aln : Alignment
The alignment to find the model of best fit for.
model_set : Iterable[str] | None, optional
Search space for models.
Equivalent to IQ-TREE's mset parameter, by default None
freq_set : Iterable[str] | None, optional
Search space for frequency types.
Equivalent to IQ-TREE's mfreq parameter, by default None
rate_set : Iterable[str] | None, optional
Search space for rate heterogeneity types.
Equivalent to IQ-TREE's mrate parameter, by default None
rand_seed : int | None, optional
The random seed - None means no seed is used, by default None.
num_threads: int | None, optional
Number of threads for IQ-TREE to use, by default None (single-threaded).
If 0 is specified, IQ-TREE attempts to find the optimal number of threads.
other_options: str, optional
Additional command line options for IQ-TREE.
Returns
-------
ModelFinderResult
Collection of data returned from IQ-TREE's ModelFinder.
Input type
----------
Alignment
Output type
-----------
ModelFinderResult
In [3]:
Copied!
mfinder = get_app("piq_model_finder")
ranked = mfinder(aln)
ranked
mfinder = get_app("piq_model_finder")
ranked = mfinder(aln)
ranked
Out[3]:
ModelFinderResult(source=None, best_aic=Model(submod_type=GTR, freq_type=F, rate_type=I+G4), best_aicc=Model(submod_type=GTR, freq_type=F, rate_type=I+G4), best_bic=Model(submod_type=TIM2, freq_type=F, rate_type=I+G4))
Accessing the best model¶
The different measures used to select the best model, AIC, AICc, and BIC, are available as attributes of the result object. We'll select AICc as the measure for choosing the best model.
In [4]:
Copied!
selected = ranked.best_aicc
selected = ranked.best_aicc
You can inspect the statistics for one of these using the model_stats attribute.
In [5]:
Copied!
ranked.model_stats[selected]
ranked.model_stats[selected]
Out[5]:
ModelResultValue(lnL=-21148.95667, nfp=41, tree_length=4.200251662)
Using the best model¶
You can apply the selected model to a phylogenetic analysis.
Note The process is the same for both the
piq_build_treeand thepiq_fit_treeapps.
In [6]:
Copied!
fit = get_app("piq_build_tree", selected)
fitted = fit(aln)
fitted
fit = get_app("piq_build_tree", selected)
fitted = fit(aln)
fitted
Out[6]:
Tree("(LngfishAu,(LngfishSA,LngfishAf),(Frog,((((Turtle,(Crocodile,Bird)),Sphenodon),Lizard),(((Human,(Seal,(Cow,Whale))),(Mouse,Rat)),(Platypus,Opossum)))));")