Information theoretic analysis of biological data
Noam Slonim, Princeton University
In recent years, researchers have been facing a rapid increase in the
available biological data. These data come in a variety of forms - complete
genome sequences, mRNA transcriptional profiles, protein-protein
interactions, and so forth. Automatic data analysis methods are often the
only route for extracting meaningful insights into these data. Existing
techniques, however, typically employ nontrivial assumptions. These
assumptions might be explicit, as in assuming a specific model which
reflects one's prior beliefs about the data; or implicit, as in arbitrarily
specifying a correlation or a "similarity" measure which lies at the core
of any further analysis. While it is clear that such assumptions should be
avoided, the conventional wisdom is that in practice they are actually
unavoidable. In this talk I will describe an information theoretic framework
that allows to extract biologically important insights without any prior
assumptions about the nature of the data for a wide variety of problems. I
will briefly discuss several recent applications of this approach, and will
present in more detail results for systematic genotype-phenotype association
in bacteria and archaea.