Skip to content

Find the model of best fit with ModelFinder

IQ-TREE's ModelFinder can be used to automatically find the model of best fit for an alignment using model_finder. The best scoring model under either the Akaike information criterion (AIC), corrected Akaike information criterion (AICc), or the Bayesian information criterion (BIC) can be selected.

Usage

Basic Usage

Construct a cogent3 alignment object, then construct a maximum-likelihood tree.

from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln)

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic

Specifying the Search Space

We expose the mset, mfreq and mrate parameters from IQ-TREE's ModelFinder to specify the substitution model search space, base frequency search space, and rate heterogeneity search space respectively. They can be specified as a set of strings in either model_set, freq_set or rate_set respectively.

from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, model_set={"HKY", "TIM"})

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic

Reproducible Results

For reproducible results, a random seed may be specified.

Caution: 0 is a specific random seed. None is equivalent to no random seed being specified.

from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, rand_seed=5)

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic

Multithreading

To speed up computation, the number of threads to be used may be specified. By default, the computation is done on a single thread. If 0 is specified, then IQ-TREE attempts to determine the optimal number of threads.

Caution: If 0 is specified with small datasets, the time to determine the optimal number of threads may exceed the time to find the maximum likelihood tree.

from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, num_threads=4)

best_aic_model = result.best_aic
best_aicc_model = result.best_aicc
best_bic_model = result.best_bic

Additional options

Additional options in the format that would be passed to the IQ-TREE CLI can be set. Options in other_options already allowed through model_finder will be ignored.

from cogent3 import load_aligned_seqs
from piqtree import model_finder

aln = load_aligned_seqs("my_alignment.fasta", moltype="dna")

result = model_finder(aln, model_set={"HKY", "TIM"}, other_options="-mtree")

See also