Using expectation-maximization to infer the early evolution of spliceosomal introns

Liran Carmel, NIH/NLM/NCBI

We propose a detailed model of evolution of exon-intron structure of eukaryotic genes that takes into account gene-specific intron gain and loss rates, branch-specific gain and loss coefficients, invariant sites incapable of intron gain, and rate variability of both gain and loss which is gamma-distributed across sites. We develop an expectation-maximization algorithm to estimate the parameters of this model. Using this model, we estimate the intron density of early eukaryotes, and isolate regions on the eukaryotic phylogenetic tree with high rate of gains or losses. We are able to reject the intron-early hypothesis, as well as the extreme intron-late viewpoint. Instead, we show an interesting kaleidoscope of gain and loss events.