Starting trees and how to specify them

Gustavo A. Ballen and Sandra Reinales

2025-08-19

Initial trees are used in three different ways in a divergence time estimation analysis: Random trees when (i) the topology and divergence times are co-estimated (e.g., Beast2), and user-defined trees when (ii) complex analyses are carried out comprising several terminals and multiple calibration points, and therefore a good initial tree helps reach convergence, and (iii) when we want to fix the topology and just estimate divergence times.

However, node times must be compatible with the calibration densities, otherwise, the likelihood of the initial tree is zero, and thus, their posterior probability. This generates an issue in programs such as Beast2 which will try a finite (e.g., \(100\)) number of tree proposals; if all these fail, the analysis will not start.

Initial trees are challenging to prepare by hand because they need to be consistent with all node calibrations, in order to the initial product of likelihood and priors is different from zero. Node times need to be within the domain of the calibration distribution, so that branch length need to be specified accordingly. We can infer initial trees using a fast method such as maximum likelihood or maximum parsimony. If the latter is used for inferring an initial tree, TNT is a popular alternative to use. However, its format is specific to the program and differs from standard newick, so it is impossible to use it as initial tree in other types of analyses. tbea has a function for converting from TNT tree format to standard newick, thus allowing to use parsimony trees as initial proposals in Bayesian programs.

# load the packages
library(tbea)
library(ape)

# create a file with multiple trees in TNT format
writeLines(text=c("tread 'three arbitrary trees in TNT format'", 
                  "(Taxon_A ((Taxon_B Taxon_C)(Taxon_D Taxon_E)))*", 
                  "(Taxon_A (Taxon_B (Taxon_C (Taxon_D Taxon_E))))*", 
                  "(Taxon_A (Taxon_C (Taxon_B (Taxon_D Taxon_E))));", 
                  "proc-;"),
           con = "someTrees.tre")

# convert TNT to newick
tnt2newick(file="someTrees.tre", return=TRUE)
## [1] "(Taxon_A,((Taxon_B,Taxon_C),(Taxon_D,Taxon_E)));"
## [2] "(Taxon_A,(Taxon_B,(Taxon_C,(Taxon_D,Taxon_E))));"
## [3] "(Taxon_A,(Taxon_C,(Taxon_B,(Taxon_D,Taxon_E))));"
# cleanup after the example
file.remove("someTrees.tre")
## [1] TRUE

In order to generate a tree in which node times are consistent with the calibration priors we can use a fast method such as penalized likelihood (implemented in the ape package) using the initial tree and our set of node calibrations:

# read tree with ape
tree <- read.tree(text="(Acestrorhynchus_falcirostris:0.1296555,
                        ((((Cynodon_gibbus:0.002334399,Cynodon_septenarius:0.00314063)
                        :0.07580842,((Hydrolycus_scomberoides:0.007028806,Hydrolycus_MUN_16211
                        :0.009577958):0.0225477,(Hydrolycus_armatus:0.002431794,
                        Hydrolycus_tatauaia:0.002788708):0.02830852):0.05110053)
                        :0.02656745,Hydrolycus_wallacei:0.01814363):0.1712442,
                        Rhaphiodon_vulpinus:0.04557667):0.1936416);")

# get the node IDs according to the mrca of pairs of appropriate
# species root
mrca(tree)["Acestrorhynchus_falcirostris", "Hydrolycus_armatus"]
## [1] 10
# Hydrolycus sensu stricto (cf. Hydrolycus)
mrca(tree)["Hydrolycus_scomberoides", "Hydrolycus_armatus"]
## [1] 15
# specify calibrations as a data frame
calibs <- data.frame(node=c(10, 15),
                     age.min=c(43, 40.94),
                     age.max=c(60, 41.6),
                     stringsAsFactors=FALSE)

# calibrate the tree with the information in calibs using ape
set.seed(15)
calibrated <- chronos(phy=tree, calibration=calibs)
## 
## Setting initial dates...
## Fitting in progress... get a first set of estimates
##          (Penalised) log-lik = -2.287137 
## Optimising rates... dates... -2.287137 
## Optimising rates... dates... -2.286804 
## Optimising rates... dates... -2.286252 
## Optimising rates... dates... -2.285774 
## Optimising rates... dates... -2.285357 
## Optimising rates... dates... -2.284991 
## Optimising rates... dates... -2.284665 
## Optimising rates... dates... -2.284374 
## Optimising rates... dates... -2.284112 
## Optimising rates... dates... -2.283877 
## Optimising rates... dates... -2.283665 
## Optimising rates... dates... -2.283471 
## Optimising rates... dates... -2.283296 
## Optimising rates... dates... -2.283134 
## Optimising rates... dates... -2.282986 
## Optimising rates... dates... -2.282851 
## Optimising rates... dates... -2.282725 
## Optimising rates... dates... -2.282609 
## Optimising rates... dates... -2.2825 
## Optimising rates... dates... -2.2824
## Warning: Maximum number of dual iterations reached.
## 
## log-Lik = -2.279377 
## PHIIC = 52.56
# plot the calibrated tree
plot(calibrated, label.offset=1.2)
axisPhylo()

# fetch the plotted tree from the graphic environment
plttree <- get("last_plot.phylo", envir=.PlotPhyloEnv)

# process the node heights and reorder as needed
nodes <- cbind(plttree$xx, plttree$yy)
nodes <- nodes[c(10, 15),]
nodes <- cbind(nodes, calibs[, c("age.min", "age.max")])
colnames(nodes) <- c("x", "y", "min", "max")
nodes <- cbind(nodes, rev.min=-(nodes[, c("min")] - max(plttree$xx)))
nodes <- cbind(nodes, rev.max=-(nodes[, c("max")] - max(plttree$xx)))

# plot the original calibrations as error bars
arrows(x0=nodes[, "rev.max"], x1=nodes[, "rev.min"],
       y0=nodes[,"y"], y1=nodes[,"y"],
       angle=90, length=0.3, lwd=3,
       code=3, col="darkblue")

nodelabels()
tiplabels()

# write the tree
write.tree(phy=calibrated)
## [1] "(Acestrorhynchus_falcirostris:56.75183146,((((Cynodon_gibbus:22.08893867,Cynodon_septenarius:22.08893867):22.79103082,((Hydrolycus_scomberoides:20.57256616,Hydrolycus_MUN_16211:20.57256616):20.36743384,(Hydrolycus_armatus:20.57671246,Hydrolycus_tatauaia:20.57671246):20.36328754):3.939969484):2.096506682,Hydrolycus_wallacei:46.97647617):4.837370917,Rhaphiodon_vulpinus:51.81384708):4.937984379);"