RNAspa: A Breadth-First Search Algorithm to Simultaneously Align and Predict the Secondary Structure of ncRNA Molecules

Yair Horesh
Bar-Ilan University

Motivation: In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. These molecules are difficult to detect, through either experimental or computational approaches. To aid in ncRNA detection, we attempted to solve the following problem: Given a set of unaligned ncRNA molecules that are taken from the same family (e.g. have similar function) we wished to simultaneously predict their secondary structure and align their sequences.
Results: We developed the RNAspa program to simultaneously align and predict the secondary structure of ncRNA molecules. The algorithm builds on the ability of the Vienna Package to include at least one "correct" or almost "correct" structure within the set of suboptimal solutions that it outputs. The suboptimal solutions are represented in a graph. The shortest path in this graph is a good alignment and a structure prediction. The algorithm takes advantage of two observations: RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a negligible loss of accuracy, and that the order of sequences being aligned, have a weak impact on the quality of the alignment. Therefore, only a few runs with the input sequences in different random orders are required. The algorithm was tested on eight datasets taken from the Rfam database. RNAspa performed better and ran faster than other programs.