Information theoretic analysis of biological data

Noam Slonim, Princeton University

In recent years, researchers have been facing a rapid increase in the available biological data. These data come in a variety of forms - complete genome sequences, mRNA transcriptional profiles, protein-protein interactions, and so forth. Automatic data analysis methods are often the only route for extracting meaningful insights into these data. Existing techniques, however, typically employ nontrivial assumptions. These assumptions might be explicit, as in assuming a specific model which reflects one's prior beliefs about the data; or implicit, as in arbitrarily specifying a correlation or a "similarity" measure which lies at the core of any further analysis. While it is clear that such assumptions should be avoided, the conventional wisdom is that in practice they are actually unavoidable. In this talk I will describe an information theoretic framework that allows to extract biologically important insights without any prior assumptions about the nature of the data for a wide variety of problems. I will briefly discuss several recent applications of this approach, and will present in more detail results for systematic genotype-phenotype association in bacteria and archaea.