Discovering DNA motifs from sequence and expression datasets
Screenshot of Allegro
(click on image for full-size version)
Allegro is a software tool for simultaneous discovery of cis-regulatory
motifs and their associated expression profiles. Its input are DNA sequences
(typically, promoters or 3' UTRs) and genome-wide expression profiles.
Its output is the set of motifs found,
and for each motif the set of genes it regulates (its transcriptional
module). Allegro is highly efficient and can analyze expression profiles
of thousands of genes, measured across dozens of experimental conditions,
along with all regulatory sequences in the genome.
Allegro has a user-friendly graphical user interface.
As of 2009, the Allegro software integrates also the Amadeus motif finding algorithm.
Updated files with known regulatory motifs from public databases:
JASPAR 5.0_ALPHA for transcription factor binding sites.
miRBase 21.0 (updated) for micro-RNA motifs.
Please download the files from the download
Amadeus is now integrated in the Expander software suite for gene expression analysis.
In addition to the stand-alone software available till now, it can also be executed as part of
the Expander analysis flow.
The promoters and 3' UTRs available in Allegro have been revised. They are available together with the software from
the download page.
AmadeusPBM is now avilable. This new addition to the software is a tool
for extracting binding site motifs from protein binding microarray (PBM) data.
You can download the new AmadeusPBM from the download
Updated micro-RNA seed motifs from miRBase 12.0 to miRBase 16.0.
Please download the new files from the download
The DREAM5 Challenge -
Amadeus ranked first in the
for identifying the binding site motifs of 66 PBM (Protein Binding Microarray) datasets (see below).
Motif discovery tasks
The Allegro software package also includes our recently published
motif discovery platform. Thus, the integrated software supports
the following motif finding tasks:
- Expression data analysis
Input: Regulatory sequences , expression dataset(s)
Output: Sequence motifs and their associated expression profiles
- Target set analysis
Input: Regulatory sequences , set(s) of co-regulated genes
Output: Motifs that are over-represented in one or more of the given sets
- Sequence-only analysis
Input: Regulatory sequences
Output: Motifs with global spatial features, i.e., motifs that appear
non-uniformly along the promoters, between the two strands, or among the chromosomes
- PBM analysis
Input: PBM files
Output: Binding site motifs of the TF that was used in the protein binding microarray experiment
Amadeus can also be used to identify binding site motifs from Protein Binding
Microarray (PBM) data. A PBM dataset of a specific TF includes its measured
binding intensities for each probe sequence covering together all possible 10-mers.
In a blind competition (DREAM5 Challenge, Sep. '10)
Amadeus ranked first (tied with one other group) in
identifying the binding site motifs of 66 TF datasets. Running time is a few seconds per dataset.
Allegro and Amadeus are described in our papers:
"Allegro: Analyzing expression and sequence in concert
to discover regulatory programs",
Y. Halperin, C. Linhart, I. Ulitsky and R. Shamir,
Nucleic Acids Research, vol. 37:5, pp. 1566-1579, April 2009.
"Transcription factor and microRNA motif discovery:
The Amadeus platform and a compendium of metazoan target sets",
C. Linhart, Y. Halperin and R. Shamir,
Genome Research, vol. 18:7, pp. 1180-1189, July 2008.