Finding Informative Regulatory Elements
Noam Slonim
IBM Research
Gene expression is directly regulated by protein transcription factors
that bind at particular DNA or RNA sites in a sequence specific manner.
A comprehensive characterization of these functional non-coding elements, or
motifs, remains a formidable challenge, especially for higher
eukaryotes.
In this talk, I will present a rigorous computational methodology for
ab-initio motif discovery from expression data, that utilizes the concept
of mutual information, and have the following characteristics:
- directly applicable to any type of expression data, thus conceptually
unifying existing motif discovery techniques,
- model-independence, i.e., model-related assumptions commonly made
by other methods are not required,
- simultaneously finds DNA motifs in upstream regions and RNA motifs
in 3'UTRs and highlights their functional relations,
- scales well to metazoan genomes,
- yields very few false positive predictions if any,
- systematically characterizes predicted motifs in terms of
functional coherence, conservation, positional and orientation biases,
cooperativity, and co-localization with other motifs,
- displays predictions via a novel user-friendly graphical
interface.
I will present results for a variety of data types, measured for
different organisms, including yeast, worm, fly, mouse, human, and the
Plasmodium parasite responsible for malaria. I will further discuss in
detail surprising observations regarding gene expression regulation that were
overlooked by previous studies and naturally arise from our analysis.
As a shorthand for our methodology we use the acronym FIRE, standing for
Finding Informative Regulatory Elements.
Based on joint work with Olivier Elemento and Saeed Tavazoie.