1: Basics
2: Mutational Analysis
3: Correlation
4: Positive Selection
5: Ancestral States
6: Morphology Priors

Tutorial 5: Ancestral States

In this tutorial we will explore how to estimate ancestral states using the sample file called croc.xml (additional information can be found in the documentation here). The sample files are contained in the Sample Files included with the software distribution. Red highlights and numbers in the images indicate the steps refered to in each section.

A. OPEN THE FILE: (Figure 1) Open the file by selecting File->Open... and then navigate to the Sample Files folder and open croc.xml file. Once the file has been read you should see the following window open.

         Figure 1.

Close the window (Cmd-W). The next step is to define the standard/morphology models for the characters to be analyzed. In this tutorial we will be examining the mutational history of characters 1 and 7.

B. OPEN CONFIGURE MODELS: (Figure 2) Open the Models window by selecting Analysis->Configure Model... (Cmd-1).

(1) Select the Morphology/Std tab if not already selected (see Figure 2).

         Figure 2.

C. CONFIGURE MODEL FOR CHAR 1: (Figure 3) Character 1 is a two-state character.

(1) Select character 1 in the Character table at the left of the window (Figure 3).

(2) Select the Beta distribution prior radio button. (Four options exist for 2-state characters, see here for more details on morphology/standard models). The beta prior uses a beta distribution on the two-state frequencies. The distribution's shape is is described the parameter alpha and because it is discretized the number of categories (k) can be selected. For this tutorial we will leave alpha and k at the default values. (See here for more details on priors.)

(3) Finally, increase the number of categories used for the Gamma distribution prior to 90.

         Figure 3.

D. CONFIGURE MODEL FOR CHAR 7: (Figure 4) Character 7 is a three-state character.

(1) Select character 7 in the Character table at the left of the window (Figure 3).

(2) Select the Empirical prior radio button. (Four options exist for 2-state characters, see here for more details on morphology/standard models). The empirical prior uses the frequencies of each state in the data file for character 1.

(3) Finally, increase the number of categories used for the Gamma distribution prior to 90.

         Figure 4.

E. CLOSE THE MODELS WINDOW: At this point we are finished configuring the models for character 1 and 7 so go ahead and close the window.

F. CONFIGURE ANALYSIS: (Figure 5 & 6) Open the Analysis window by selecting Analysis->Configure Analysis... (Cmd-2).

(1) Select the Ancestral states radio button.

(2) Since we are analyzing characters 1 and 7 only select all of the other characters and exclude them from the analysis by highlighting the desired characters and pressing the Exclude button.

         Figure 5.

Next, we have the option of choosing which trees and which parameters to use in the analysis. In a morphological/standard analysis the parameter options do not apply. To select which trees to analyze select the Sampling tab (Figure 6).

(1) Trees and parameters can be changed through options in the region highlighted in red. You can select to Use all trees (or parameters) or to select Use trees numbered to select a subset of trees in the file. Trees and parameters are numbered sequentially from the first in the file, to the last. In this tutorial we will be using all 4 trees available in the file so no changes are necessary.

         Figure 6.

One final note. When performing an ASR analysis of molecular data one additional option is available: Link parameter order to tree order. This option by default links a tree with each parameter (e.g., 1 with 1, 2 with 2, etc.). If your trees and parameters are derived from samples from a posterior distribution (e.g., from MrBayes or an equivalent type of program) then it is recommened to leave this option active. When this option is set to be inactive, or "unlinked", be aware that every possible tree and parameter combination is evaluated and can substantially increase the analysis time.

G. CLOSE THE ANALYSIS WINDOW: At this point we are finished configuring the analysis so go ahead and close the window.

H. START THE ANALYSIS: We are now ready to run the analysis. This can be done by selecting Analysis->Run Analysis... (Cmd-R). At this point you should observe the progress indicator letting you know how long before the run will be finished.

I. REVIEW THE RESULTS: (Figure 7) Now that the analysis is complete let's look at the results by opening the Anctesral States window by selecting Statistics->Morphology->Ancestral States....

         Figure 7.

This window (Figure 7) contains two tabs: Results and Bipartition table. The first includes the posterior probabilities of different states, for different characters, and different nodes in the trees in the data file. The other tab describes the mapping between clade ID and the members of the clade. Let's look at the information in each of these tabs briefly.

Results tab

The results tab contains two tables. The character table (shown on the left; Figure 7) lists all of the sites (characters) included in the most recent analysis. The results table (shown on the right) displays the results for the character selected in the site table on the left. This table has a number of columns. The first column is the Clade ID. This represents an arbitrary number from 1 to N unique clades found in the trees included in the analysis. This will be described in more detail in the discussion below on the Bipartition Table. The next column, Sample Size, indicates the number of trees the clade was observed in. The marginal posterior probabilities reported are averaged over all trees in which the clade exists. The remaining columns represent the probability of specific states at each internal node (clade ID).

Bipartition Table tab (Figure 8) To determine the clades you are interested in you need to use the mapping between Clade ID and the Bipartition found in the Bipartition Table.

         Figure 8.

To understand how to read a bipartition we will go through a very simple example. Each species is represented in a bipartition by either a 0 or a 1. The order from left to right is determined by the species id in the input file (e.g., <translate id="1">spp_name</translate> shows the "spp_name" will be idenitifed in the tree by a "1" and in the bipartition at position 1.).

For example, if you have the following translate tags

<translate id="1">seq_1</translate>
<translate id="2">seq_2</translate>
<translate id="3">seq_3</translate>
<translate id="4">seq_4</translate>

and the following tree,


then the following non-trivial bipartitions exist.

1111 [The root - all species are included]
1100 [Clade containing seq_1 and seq_2]
0011 [Clade containing seq_3 and seq_4].


To save the results of an analysis simply select Save Results... button at the lower right corner of the window. Next select a location and file name and select the Save button. The output saves the bipartition table and the results by clade and character.