Fit a model to one tree
We evaluate a support for a specific phylogeny using piq_fit_tree.
For this simple case, we will assess the support for a specific phylogeny using the GTR model on one alignment.
In [1]:
Copied!
import cogent3
from piqtree import download_dataset
aln_path = download_dataset("example.phy.gz", dest_dir="data")
aln = cogent3.load_aligned_seqs(aln_path, moltype="dna", format_name="phylip")
aln
import cogent3
from piqtree import download_dataset
aln_path = download_dataset("example.phy.gz", dest_dir="data")
aln = cogent3.load_aligned_seqs(aln_path, moltype="dna", format_name="phylip")
aln
Out[1]:
| 0 | |
| Bird | CTACCACACCCCAGGACTCAGCAGTAATTAACCTTAAGCATAAGTGTAACTTGACTTAGC |
| LngfishAu | ..C.............AC.......G......A..G........C.A.G.......C... |
| LngfishSA | .A..............AA.............T....G.........A........C...T |
| LngfishAf | .A...............A.............AA..GGA................TCC... |
| Frog | AA.TTTGGT..TGT..T........G..A...A..G.A...G..C.A.G..C..T.C..T |
| Turtle | ..T......................G..A..AA...........C.A.G..........T |
| Sphenodon | ..C..............A.......G.....TA.............A............T |
| Lizard | ..T........A...CA........G..A...A...........C.A.G..........T |
| Crocodile | ..C............C.A........G....TA...G.......C.A.G.C....C...T |
| Human | ................AA.......G.........T.......AC.A.GT..A...A... |
17 x 1998 (truncated to 10 x 60) dna alignment
We use a tree corresponding to the data in the example.phy file.
In [2]:
Copied!
tree_path = download_dataset("example.tree.gz", dest_dir="data")
tree = cogent3.load_tree(tree_path)
tree.get_figure().show()
tree_path = download_dataset("example.tree.gz", dest_dir="data")
tree = cogent3.load_tree(tree_path)
tree.get_figure().show()
We now take a look at the help for the piq_fit_tree app.
In [3]:
Copied!
from cogent3 import app_help
app_help("piq_fit_tree")
from cogent3 import app_help
app_help("piq_fit_tree")
Options for making the app
--------------------------
piq_fit_tree_app = get_app(
'piq_fit_tree',
tree: cogent3.core.tree.PhyloNode,
model: piqtree.model._model.Model | str,
num_threads=None,
other_options='',
bl_fixed=False,
)
Fit branch lengths and likelihood for a tree.
Given a sequence alignment and a fixed topology,
uses IQ-TREE to fit branch lengths to the tree.
Parameters
----------
aln : Alignment
The sequence alignment.
tree : PhyloNode
The topology to fit branch lengths to.
model : Model | str
The substitution model with base frequencies and rate heterogeneity.
num_threads: int | None, optional
Number of threads for IQ-TREE to use, by default None (single-threaded).
If 0 is specified, IQ-TREE attempts to find the optimal number of threads.
bl_fixed: bool, optional
If True, evaluates likelihood using the provided branch lengths on the tree.
Branch lengths will be treated as constant in this case, with any unspecified
branch lengths still being optimised. Otherwise if False, branch lengths are
fitted to the tree whether provided or not. By default False.
other_options: str, optional
Additional command line options for IQ-TREE.
Returns
-------
PhyloNode
A phylogenetic tree with the same given topology fitted with branch lengths.
Input type
----------
Alignment
Output type
-----------
PhyloNode
We fit a GTR model and estimate the nucleotide frequencies by maximum likelihood.
In [4]:
Copied!
from cogent3 import get_app
fit_gtr = get_app("piq_fit_tree", tree, model="GTR+FO")
fit_gtr
from cogent3 import get_app
fit_gtr = get_app("piq_fit_tree", tree, model="GTR+FO")
fit_gtr
Out[4]:
piq_fit_tree(tree=Tree("(LngfishAu,(LngfishSA,LngfishAf),(Frog,((Turtle,((Sphenodon,Lizard),(Crocodile,Bird))),(((Human,(Seal,(Cow,Whale))),(Mouse,Rat)),(Platypus,Opossum)))));"),
model='GTR+FO', num_threads=None, other_options='', bl_fixed=False)
Fit the model¶
In [5]:
Copied!
fit = fit_gtr(aln)
fit = fit_gtr(aln)
The fit object is a cogent3 tree (PhyloNode) and the maximum likelihood estimated parameters are stored in the params attribute.
Note "lnL" is short for log likelihood and is the log likelihood of the model. "mprobs" is short for motif probabilities and are the estimated equilibrium frequencies of the nucleotides.
In [6]:
Copied!
fit.params
fit.params
Out[6]:
{'lnL': -22677.6013765,
'mprobs': {'A': 0.3112380546,
'C': 0.2540760845,
'G': 0.2022644614,
'T': 0.2324213996},
'model': 'GTR+FO',
'source': 'data/example.phy.gz'}