A Study of Accessible Motifs and RNA Folding Complexity

Michal Ziv-Ukelson
TAU

mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessability criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n^3) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction [Zuker et al., Nussinov et al.]. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n^2 \psi(n)), where \psi(n) is shown to be constant on average under standard polymer folding models. Benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size.
The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. Thaliana.

This paper, joint work with Ydo Wexler and Chaya Zilberstein, has received the RECOMB2006 Special Mention.