Reliable protein function prediction by Bayesian phylogenomics

Steven Brenner, Department of Plant and Microbial Biology, Berkeley

Most of what we know about most proteins' functions comes from sequence comparison. Roughly 250,000,000,000,000 BLAST comparisons are made each year at NCBI alone, largely for this purpose. However, the process of predicting protein function using homology remains problematic, and pairwise methods of sequence comparison are systematically flawed. As such, accurate genome functional annotation continues to be a major challenge.
Phylogenomics has emerged as a promising method of reliable and specific protein function annotation. This methodology incorporates an explicit phylogenetic analysis and all known functional descriptions for a family. Unfortunately, its manual application is exquisitely time-consuming; it depends upon detailed study by domain experts and offers no consistent means of reporting confidence in results.
We have constructed a statistical graphical model to infer specific molecular function for unannotated protein sequences using phylogenomic principles. SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts function for members of a protein family using a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Given a phylogenetic tree, the algorithm operates in time that grows linearly with the number of proteins. The first implementation of SIFTER yields more correct protein function annotations than any other available method. Numerous planned enhancements which will further improve SIFTER's effectiveness and applicability.

Reference:
Engelhardt BE, Jordan MI, Muratore KE, Brenner SE. 2005. Protein molecular function prediction by Bayesian phylogenomics. PLoS Computational Biology 1: e45.