Finding Informative Regulatory Elements

Noam Slonim
IBM Research

Gene expression is directly regulated by protein transcription factors that bind at particular DNA or RNA sites in a sequence specific manner. A comprehensive characterization of these functional non-coding elements, or motifs, remains a formidable challenge, especially for higher eukaryotes.
In this talk, I will present a rigorous computational methodology for ab-initio motif discovery from expression data, that utilizes the concept of mutual information, and have the following characteristics:
  1. directly applicable to any type of expression data, thus conceptually unifying existing motif discovery techniques,
  2. model-independence, i.e., model-related assumptions commonly made by other methods are not required,
  3. simultaneously finds DNA motifs in upstream regions and RNA motifs in 3'UTRs and highlights their functional relations,
  4. scales well to metazoan genomes,
  5. yields very few false positive predictions if any,
  6. systematically characterizes predicted motifs in terms of functional coherence, conservation, positional and orientation biases, cooperativity, and co-localization with other motifs,
  7. displays predictions via a novel user-friendly graphical interface.
I will present results for a variety of data types, measured for different organisms, including yeast, worm, fly, mouse, human, and the Plasmodium parasite responsible for malaria. I will further discuss in detail surprising observations regarding gene expression regulation that were overlooked by previous studies and naturally arise from our analysis. As a shorthand for our methodology we use the acronym FIRE, standing for Finding Informative Regulatory Elements.

Based on joint work with Olivier Elemento and Saeed Tavazoie.