Where we can see that trees T1 and T2 differ only in the location of nodes A and B -- on these trees, we can naturally think of the nodes A, B, 1,..., 6 as representing leaves, but they might also be large subtrees.

The RF distance is the number of edges (=branches) that are unique to each tree (that's why it's also called the symmetric difference), and it may be normalized to one. If we highlight the unique edges on trees T1 and T2

We see that the (unnormalized) RF distance is 10. For dichotomic trees, the number of unique edges is the same on both trees.

The NNI distance is the minimum number of NNIs that must be applied to one tree such that it becomes equal to the other. One NNI branch swap will change exactly one edge, thus is very tempting to assume that the NNI distance can be found by looking at the distinct edges.

But the problem is when the same branch is involved in more than one path of the "NNI walk". The RF distance (divided by two, for fully resolved trees) is then a lower bound on the minimum number of NNIs. In our example:

The NNI distance between T1 and T2 is 6, one more than the RF distance since the edge splitting (1,2,3) and (4,5,6) is used twice in the NNI computation. The problem, as explained by Liam, is that simulating trees with a specified distance is hard, and the solution of using very large trees masks the cases where the distances disagree...

**Reference**:

Bryant D. (2004). The Splits in the Neighborhood of a Tree, Annals of Combinatorics, 8 (1) 1-11. DOI: 10.1007/s00026-004-0200-z

(Crossposted from Bioinformatics News and Reviews, my personal blog)

## No comments:

## Post a Comment

Use the space below to ask, inform and criticize -- if you are not very happy please read the rules for commenting.

Please, do not include unrelated, commercial sites not even in your signature.