A Study of Accessible Motifs and RNA Folding Complexity
Michal Ziv-Ukelson
TAU
mRNA molecules are folded in the cells and therefore many of
their substrings may actually be inaccessible to protein and
microRNA binding. The need to apply an accessability criterion
to the task of genome-wide mRNA motif discovery raises the
challenge of overcoming the core O(n^3) factor imposed by the
time complexity of the currently best known algorithms for RNA
secondary structure prediction [Zuker et al., Nussinov et al.].
We speed up the dynamic programming algorithms that are
standard for RNA folding prediction. Our new approach
significantly reduces the computations without sacrificing
the optimality of the results, yielding an expected time
complexity of O(n^2 \psi(n)), where \psi(n) is shown to be
constant on average under standard polymer folding models.
Benchmark analysis confirms that in practice the runtime ratio
between the previous approach and the new algorithm indeed
grows linearly with increasing sequence size.
The fast new RNA folding algorithm is utilized for genome-wide
discovery of accessible cis-regulatory motifs in data sets of
ribosomal densities and decay rates of S. cerevisiae genes and
to the mining of exposed binding sites of tissue-specific
microRNAs in A. Thaliana.
This paper, joint work with Ydo Wexler and Chaya Zilberstein,
has received the RECOMB2006 Special Mention.