Wednesday, November 30, 2016

Comparative methods in the genomic era

Every modern biologist should know about phylogenetic comparative methods, even if they are not familiar with the term. The idea is that we cannot compare biological traits without taking into account the evolutionary relations between the species/populations used as "sample points". This is because individuals are not independent samples of whatever variables we are trying to correlate (they are "pseudoreplicates").

However, not everybody is an up-to-date biologist, as we see in the figure below: birds with similar body masses would also have similar flight speeds not because these two traits are related, but because these birds are closely related. That is, not long ago they were a single species (with a single value for each trait). The solution to avoid these spurious correlations is, in abstract terms, to think about what trait values do we expect for their ancestors. Hopefully we will have this in mind in upcoming large-scale genomic analyses, where sophisticated statistical algorithms for ever-increasing amounts of data might obfuscate their biological appropriateness.

from doi:10.1063/1.4886855
Here I include a few links to good comments and papers discussing comparative methods, starting with Felsenstein's reaction to the paper above (which is about airplanes, by the way). I also include at the bottom the slides of an impromptu journal club I presented a few years ago about things we can model over a tree, with strong emphasis on phylogenetic contrasts.

Physicists and engineers decide how to analyze evolution (by Joe Felsenstein):
They make allometric plots of features of new airplane models, log-log plots over many orders of magnitude. The airplanes show allometry: did you know that a 20-foot-long airplane won’t have 100-foot-long wings? That you need more fuel to carry a bigger load?
But permit me a curmudgeonly point: This paper would have been rejected in any evolutionary biology journal. Most of its central citations to biological allometry are to 1980s papers on allometry that failed to take the the phylogeny of the organisms into account. The points plotted in those old papers are thus not independently sampled, a requirement of the statistics used. (More precisely, their error residuals are correlated).
Steps towards understanding comparative methods – Wainwright Lab:
Felsenstein elaborates on his method of calculating standardized contrasts (phylogenetically independent contrasts) to help overcome the non-independence of character traits.  These contrasts are basically the differences between trait values of species pairs weighted by the evolutionary change separating them; they are estimates of the rate of change over time. A common use of standardized contrasts is to look for correlation in this rate between two traits; if standardized contrasts of traits X and Y are compared in a regression analysis, a linear trend suggests correlated rates of evolution between the traits.
Strong phylogenetic inertia on genome size and transposable element content among 26 species of flies | Biology Letters doi:10.1098/rsbl.2016.0407:
To date, quantifying the importance of phylogenetic inertia in TE content distribution remains a key question as the dynamic of TE accumulation is still poorly understood. Here, we analysed the evolution of genome size and genomic TE content in 26 Drosophila, using a phylogenetic framework. We estimated genomic TE content using a de novo TE assembly approach, tested the correlation between TE content and genome size among closely related species and finally estimated the phylogenetic inertia.
Comparative analyses were performed using ape [21], nlme [22] and phytools [23] packages in R. Ancestral trait reconstruction of genome size was calculated using phylogenetic independent contrasts. We tested the phylogenetic signal using Pagel's λ [24]. Best-fitting model to the trait evolution and its covariance structure was tested among (i) absence of phylogenetic signal, (ii) neutral Brownian motion and (iii) constrained evolution Ornstein–Uhlenbeck (OU) models using generalized least squares (GLS) and selected according to minimum Akaike information criterion (AIC).
Controlling for non-independence in comparative analysis of patterns across populations within species | Philosophical Transactions of the Royal Society B: Biological Sciences doi:10.1098/rstb.2010.0311:
How do we quantify patterns (such as responses to local selection) sampled across multiple populations within a single species? Key to this question is the extent to which populations within species represent statistically independent data points in our analysis. Comparative analyses across species and higher taxa have long recognized the need to control for the non-independence of species data that arises through patterns of shared common ancestry among them (phylogenetic non-independence), as have quantitative genetic studies of individuals linked by a pedigree.
The Unsolved Challenge to Phylogenetic Correlation Tests for Categorical Characters | Systematic Biology doi:10.1093/sysbio/syu070:
When comparative biologists observe that animal species living in caves also tend to have reduced eyes, they may see such correlation as evidence that the traits are adaptively or functionally linked: for instance, selection to maintain eye function is relaxed when light is unavailable.
However, the last few decades have taught us that among-species correlative tests should take into account evolutionary relationships (Felsenstein 1985; Ridley 1989; Harvey and Pagel 1991). If phylogeny is not taken into account, an interpreted correlation may have a trivial explanation different from the biological relationship we claim. There is a correlation among species in the distribution of fur and bones in the middle ear—species with fur also have three bones in the middle ear, and vice versa. These two traits are characteristics of mammals, and absent outside the mammals. Using their shared distribution as evidence of an interesting biological relationship between fur and middle ear bones would be considered a mistake, however, for reasons understood long ago by Darwin (1872)

No comments:

Post a Comment

Use the space below to ask, inform and criticize -- if you are not very happy please read the rules for commenting.

Please, do not include unrelated, commercial sites not even in your signature.