BayesTraits Wrapper (btw)
btw is an R package for running the BayesTraits phylogenetic comparative methods software from R. The functions in
btw run BayesTraits on your system, import and parse the output files in R, and delete the output files from your system.
There are two major advantages to using
btw rather than running BayesTraits from the command line. First,
btw makes it easy to incorporate BayesTraits into R scripts. Second,
btw automatically parses BayesTraits output into ready-to-use formats for R analyses, greatly simplifying your workflow if you analyze your BayesTraits output in R.
BayesTraits was developed by Mark Pagel and Andrew Meade, and is available from their website as an executable file that can be run from a command line program such as Terminal (MacOS) or Command Prompt (Windows).
PLEASE read the BayesTraits documentation thoroughly before trying to use
btw, as the tutorial presented on this page assumes you are familiar with the capabilities, commands, and types of output produced by BayesTraits. I cannot emphasize this enough - if you don’t know how to use BayesTraits, then you are going to have problems using
btw. The purpose of
btw is not to distance users from the original software and documentation; it is simply a tool to streamline your workflow.
This is version 2 of
btw, which I added in May 2018. There are three significant changes from version 1:
- Windows compatibility (formerly
btwonly worked on MacOS)
- Simplified structure for specifying BayesTraits commands
- It works with BayesTraitsV3
I recommend working with this version of
btw and BayesTraitsV3 (which is the most recent version as of May 2018). If you want to work with version 1 of
btw, visit the version 1 project page for a tutorial that demonstrations how to install and use version 1.
There is one feature of BayesTraitsV3 that
btw does not support: the variable rates model. I don’t think this is a great loss because I’m skeptical about the model - I think it produces vastly overparameterized models (see Ho & Ané 2014 for an excellent discussion of this issue). On a practical note, the output of this model is a nightmare and I had a hard time understanding it. Pagel & Meade, perhaps recognizing this, created a web tool to process the output from their variable rates model, so if you want to run their variable rates model you can use their web tool.
If you use
btw and you notice any ways in which it can be improved, please let me know in a comment or e-mail (comments are preferred since it can be useful for others to see your comment and my response). Feedback from users was a big part of why I produced this update - it let me know that my code is being used and gave me specific ideas for how to improve it!
If you have the
devtools package installed, install the most recent version of
btw from GitHub like this:
library("devtools") install_github("rgriff23/btw") library("btw")
btw, you must navigate to the directory containing BayesTraitsV3 using the
setwd command. On my Mac, BayesTraitsV3 is in a directory on my desktop, so I navigate there like this:
You can double-check that you are in the right directory by typing
list.files() in your R console. If you are in the right place, then BayesTraitsV3 will be one of the file names that prints to your R console.
All of the output files produced by
btw and BayesTraits will reside in this directory. This is an important point, because any files existing in this directory can be overwritten if they happen to have the same name as the files being produced by
btw or BayesTraits. Specifically, files named
inputfile.txt, or BayesTraits output files beginning with
data.txt will be overwritten. The point is, don’t put any files with these names in the BayesTraitsV3 directory unless you are prepared for them to be overwritten and/or deleted.
btw comes with an example phylogeny and several data sets, which you can load into your R session after
btw is loaded.
ls() into your R console to see the data files that were loaded. Some important formatting points about the data:
- Phylogenies must be of class
- The first column of your data file must contain species names
- Species names must match exactly between the tree and data (but order doesn’t matter)
- You cannot have spaces in your species names
- Discrete characters should be represented with characters or factors (not integers!) between 0 and 9
- A discrete character/factor with more than 1 digit is interpreted as ambiguous (e.g., “01”), and indicates that the species is equally likely to be in state “0” or “1” (now you see why the discrete characters can’t be of class ‘integer’, because 01 would just be seen as 1)
- If testing for correlated evolution in discrete characters, use 0 and 1 as the character states
- The only valid way to represent missing data is with a “-“ character
Now let’s run some analyses.
In this updated version of
btw, you must list the desired BayesTraits commands in a character vector, in the correct order. The only exception is that you do not need to include the final “run” command. So, for example, to fit the Multistate model in maximum likelihood mode without any additional parameters, your vector of commands would simply be:
# commands to run multistate model ("1") in Maximum Likelihood mode ("1") command_vec1 <- c("1", "1")
To run the analysis using the data files
primate.tree1, you would pass these files as the first and second argument to the
bayestraits function, and then pass the command vector as the third argument.
results_1 <- bayestraits(primate.discrete1, primate.tree1, command_vec1)
You can include any valid BayesTraits commands in the command vector, but avoid commands that specify names for the output files, since the naming of output files is handled within the
bayestraits function. If you specify file names for the BayesTraits output, it WILL screw things up because the
bayestraits function won’t be able to find those files to scan the results into R.
There are two additional logical arguments that you can pass to
bayestraits. First, the
silent argument tells
bayestraits whether to display output from BayesTraits in the R console - the default is
silent = TRUE, which silences the output from BayesTraits. Second, the
remove_files argument tells
bayestraits whether to remove the output files from the working directory - the default is
remove_files = TRUE, which results in all the files produced by
btw and BayesTraits being deleted from disk after the results are scanned into R.
Now let’s dive into the output.
The result of running
bayestraits is a named list, with each element containing the parsed output from a single BayesTraits output file. For the simple model we just ran, there is only one output file, the Log file. The parsed output from the log file can be pulled from the results list like this:
log_file_1 <- results_1$Log
The information from the log file is stored as a named list with two elements: 1) a block of information about the options specified for the model, and 2) a table of results. In maximum likelihood mode, there is one row of results for each tree in the tree file. In MCMC mode, there is one row for each sample from the posterior distribution. These elements can be accessed as with any named list:
model_options_1 <- log_file_1$options model_results_1 <- log_file_1$results
If we run an MCMC analysis, we get an additional output file called the Schedule file, which is used for monitoring chain mixing. Like log files, the schedule file is stored as a named list in R and can be accessed by name.
# commands to run multistate model ("1") in MCMC mode ("2") command_vec_2 <- c("1", "2") # run analysis results_2 <- bayestraits(primate.discrete1, primate.tree1, command_vec_2) # extract log and schedule file results log_2 <- results_2$Log schedule_2 <- results_2$Schedule
The number of items in the schedule list will depend on the model settings.
There are several more types of output files produced by BayesTraitsV3. The AncStates file is unique to the geographic model implemented in BayesTraitsV3. We can fit a geographic model and access the AncStates output like this:
# commands to run geographic model ("13") in maximum likelihood mode command_vec_3 <- c("13") # run analysis results_3 <- bayestraits(primate.continuous2, primate.tree100, command_vec_3) # extract log, schedule, and AncStates file results log_3 <- results_3$Log schedule_3 <- results_3$Schedule ancestors_3 <- results_3$AncStates
The next analysis introduces the Stones file, which is produced when the stepping stone sampler is used.
# commands to run Multistate model ("1") in MCMC mode ("2") with stepping stone sampler command_vec_4 <- c("1", "2", "Stones 100 1000") # run analysis results_4 <- bayestraits(primate.discrete1, primate.tree1, command_vec_4) # extract log, schedule, and stones file results log_4 <- results_4$Log schedule_4 <- results_4$Schedule stones_4 <- results_4$Stones
Finally, the OutputTrees file is produced when the
SaveTrees command is used. This is just a nexus file.
# commands to run continuous random walk trait model ("4") in MCMC mode ("2"), estimating lambda # and saving initial trees, transformed trees command_vec_5 <- c("4", "2", "lambda", "SaveTrees") # run analysis results_5 <- bayestraits(primate.continuous1, primate.tree100, command_vec_5) # extract log, schedule, and output trees log_5 <- results_5$Log schedule_5 <- results_5$Schedule output_trees_5 <- results_5$OutputTrees
It is also possible to tell BayesTraits to save the initial sample of trees using the
SaveInitialTrees command, and this requires you to specify a file name for the saved trees. You can tell BayesTraits to do this in the command vector, but
btw won’t handle the saved initial trees in any way - they will just be saved in your working directory, and if you wish to bring them into R, you can do it using
Parsing existing BayesTraits output
In case you have some BayesTraits output that you want to read into R without actually running
bayestraits, I have provided several parsing functions that can be used for this purpose. Previously the code for these functions was embedded inside the functions for running BayesTraits so you couldn’t use them independently. There are four parsing functions:
parse_log- parses a BayesTraits log file
parse_schedule- parses a BayesTraits schedule file
parse_stones- parses a BayesTraits stones file
parse_ancstates- parses a BayesTraits AncStates file
Each function takes a file path as input and produces output identical to the output produced by
bayestraits, except for a single output file rather than all of the output files from an analysis. There are no special functions for importing the tree files because you can just use
read.nexus from the
There are a few more functions included in
btw. An important one is
killbt, which simply kills all BayesTraits processes running on your system. If you accidentally start a long BayesTraits processes and realize that you made a mistake, you have to kill that process otherwise it will keep running in the background and can really bog down your system. You can execute this function by hitting ESC or Ctrl-z to escape the function within R and typing
killbt() in the R console. Sometimes you can’t escape the function without restarting your R session. That’s fine, but keep in mind that the process can keep running after R shuts down, so the first thing you will want to do when you restart R is run
Another couple of handy shortcuts are
bftest, which perform likelihood ratio and Bayes Factor tests respectively. Each function takes a pair of model results as arguments, e.g.,
lrtest(model_results_1, model_results_2). It isn’t difficult to extract the relevant quantities from the model results and perform the tests on your own, but these functions can save you some typing.
Before messaging me, check that you are specifying all of the BayesTraits parameters correctly. Go to your command line program (e.g., Terminal on a Mac, Command Prompt in Windows) and make sure you can run BayesTraits using the commands exactly as you have specified them in your command vector.
If you are specifying the commands correctly and
bayestraits is still not working properly, then there may be a problem with the parsing functions. You can investigate this by running BayesTraits from the command line and trying to use the parsing functions to pull each output file into R. If one of these fails, then you know the problem is with that function.
It is very possible that BayesTraits produces output that is unexpected in some way and my parsing functions won’t work. Sometimes, this is because of inconsistencies and typos in BayesTraits. For example, when you run BayesTraits in MCMC mode, the Log file contains a line that says “Analysis Type: MCMC”, but when you run BayesTraits in Maximum Likelihood mode, the word “Analysis” is misspelled in that line, so it reads “Analsis Type: Maximum Likelihood”. Luckily I discovered this one early on and worked around it, but I’m sure there are other places where unexpected output will break my parsing functions, so do let me know if you find a problem. Be sure to send me the data and commands you used so that I can replicate the problem.