RNAspa: A Breadth-First Search Algorithm to Simultaneously Align and
Predict the Secondary Structure of ncRNA Molecules
Yair Horesh
Bar-Ilan University
Motivation: In recent years, RNA molecules that are not translated into
proteins (ncRNAs) have drawn a great deal of attention, as they were shown
to be involved in many cellular functions. These molecules are difficult
to detect, through either experimental or computational approaches. To aid
in ncRNA detection, we attempted to solve the following problem: Given a
set of unaligned ncRNA molecules that are taken from the same family (e.g.
have similar function) we wished to simultaneously predict their secondary
structure and align their sequences.
Results: We developed the RNAspa program to simultaneously align and predict
the secondary structure of ncRNA molecules. The algorithm builds on the
ability of the Vienna Package to include at least one "correct" or almost
"correct" structure within the set of suboptimal solutions that it
outputs. The suboptimal solutions are represented in a graph. The shortest
path in this graph is a good alignment and a structure prediction. The
algorithm takes advantage of two observations: RNA secondary structures
can be compared very rapidly by a simple string Edit-Distance algorithm
with a negligible loss of accuracy, and that the order of sequences being
aligned, have a weak impact on the quality of the alignment. Therefore,
only a few runs with the input sequences in different random orders are
required. The algorithm was tested on eight datasets taken from the Rfam
database. RNAspa performed better and ran faster than other programs.