Use different kinds of substitution models
piqtree currently supports all named IQ-TREE DNA models including Lie Markov models, empirical amino-acid exchange rate matrices, as well as specification for base frequencies and rate heterogeneity across sites.
We use the Model class to represent the substitution model which can be constructed from strings, or
using enums. Most functions can already compute this directly from the passed model string, so usage of make_model may be preferable. Substitution models can be combined with specification for base frequencies, and rate heterogeneity across sites models.
Usage
Standard DNA Models
DNA models may be specified using the StandardDnaModel enum, or by using
the IQ-TREE string representation. A full list of supported DNA models is available here.
from piqtree import Model
from piqtree.model import StandardDnaModel
hky_model_1 = Model("HKY")
hky_model_2 = Model(StandardDnaModel.HKY)
Parameterisation of Standard DNA Models
IQ-TREE allows DNA models to be parameterised by specifying relative substitution rates. See the IQ-TREE documentation for the precise definition of the parameters for the chosen model.
from piqtree import Model
from piqtree.model import StandardDnaModel
model = Model("GTR{1.0,2.0,1.5,3.7,2.8}")
model = Model(StandardDnaModel.GTR([1.0, 2.0, 1.5, 3.7, 2.8]))
Lie Markov Models
Lie Markov models may be specified using the LieModel enum, or by using
the IQ-TREE string representation. A full list of supported DNA models is available here. The pairing prefix may also be specified - RY for purine-pyrimidine pairing (default); WS for weak-strong pairing; and MK aMino-Keto pairing.
from piqtree import Model
from piqtree.model import LieModel
lie_ws_6_6_model_1 = Model("WS6.6")
lie_ws_6_6_model_2 = Model(LieModel.LIE_6_6("WS"))
lie_12_12_model_1 = Model("12.12")
lie_12_12_model_2 = Model(LieModel.LIE_12_12)
Parameterisation of Lie Markov Models
IQ-TREE supports the parameterisation of Lie Markov models. The first number (before the period) corresponds to the number of basis matrices used by the model. The first basis matrix is unparameterised, and the parameters refer to the weights (bounded between -0.98 and 0.98 exclusive) of the remaining basis matrices of the model.
from piqtree import Model
from piqtree.model import LieModel
model = Model("MK3.3b{0.3,-0.5}")
model = Model(LieModel.LIE_3_3b("MK", [0.3, -0.5]))
Amino-acid Models
Amino-acid models may be specified using the AaModel enum, or by using
the IQ-TREE string representation. A full list of supported amino-acid models is available here.
from piqtree import Model
from piqtree.model import AaModel
dayhoff_model_1 = Model("Dayhoff")
dayhoff_model_2 = Model(AaModel.Dayhoff)
nq_yeast_model_1 = Model("NQ.yeast")
nq_yeast_model_2 = Model(AaModel.NQ_yeast)
Base Frequencies
Three types of base frequencies can be specified using the FreqType enum, or alternatively the CustomBaseFreq class can be used to fix base frequencies. Otherwise, the IQ-TREE string representation can be used.
If base frequencies are not specified, the chosen model's default settings are used.
F: Empirical base frequencies. String representation also used to fix base frequencies.FQ: Equal base frequencies.FO: Optimised base frequencies by maximum-likelihood.
from piqtree import Model
from piqtree.model import CustomBaseFreq, FreqType
# Default for the GTR model
empirical_freqs_1 = Model("GTR", freq_type="F")
empirical_freqs_2 = Model("GTR", freq_type=FreqType.F)
equal_freqs_1 = Model("GTR", freq_type="FQ")
equal_freqs_2 = Model("GTR", freq_type=FreqType.FQ)
opt_freqs_1 = Model("GTR", freq_type="FO")
opt_freqs_2 = Model("GTR", freq_type=FreqType.FO)
custom_freqs_1 = Model("GTR", freq_type="F{0.1,0.2,0.3,0.4}")
custom_freqs_2 = Model("GTR", freq_type=CustomBaseFreq([0.1, 0.2, 0.3, 0.4]))
Rate Heterogeneity
Invariable Sites
A boolean flag can be specified when constructing the Model class to allow for a proportion of invariable sites. To fix the proportion of invariable sites, a float representing the proportion can be used instead.
from piqtree import Model
without_invar_sites = Model("TIM", invariable_sites=False) # Default
with_invar_sites = Model("TIM", invariable_sites=True)
prop_invar = Model("TIM", invariable_sites=0.1)
Discrete Gamma Model
We support the DiscreteGammaModel allowing for a variable number of rate categories (by default 4). The Gamma shape parameter (alpha) can also be set.
from piqtree import Model
from piqtree.model import DiscreteGammaModel
# 4 rate categories, no invariable sites
k81_discrete_gamma_4 = Model("K81", rate_model=DiscreteGammaModel())
# 8 rate categories, with invariable sites
k81_invar_discrete_gamma_8 = Model("K81", rate_model=DiscreteGammaModel(8), invariable_sites=True)
# 4 rate categories, alpha=0.2
k81_alpha = Model("K81", rate_model=DiscreteGammaModel(alpha=0.2))
FreeRate Model
We support the FreeRateModel allowing for a variable number of rate categories (by default 4). The FreeRate weights for each of the rate categories and the corresponding rates may also be fixed.
from piqtree import Model
from piqtree.model import FreeRateModel
# 4 rate categories, no invariable sites
sym_free_rate_4 = Model("SYM", rate_model=FreeRateModel())
# 8 rate categories, with invariable sites
sym_invar_free_rate_8 = Model("SYM", rate_model=FreeRateModel(8), invariable_sites=True)
# 2 rate categories with specified weights and rates
sym_free_rate_parameterised = Model("SYM", rate_model=FreeRateModel(2, weights=[0.2, 0.8], rates=[2.5, 0.625]))
Making Model Classes from IQ-TREE Strings
For the supported model types, the Model class can be created by using make_model on the IQ-TREE string representation of the model.
See also
- Use a
Modelto construct a maximum likelihood tree: "Construct a maximum likelihood phylogenetic tree". - Use a
Modelto fit branch lengths to a tree topology: "Fit branch lengths to a tree topology from an alignment". - Use a
Modelto simulate an alignment with AliSim: "Simulate alignments with AliSim".