Home 
Amadeus home  
Overview 
Amadeus software overview  
Download 
Download Amadeus and the compendium  
Supplementary data 
Supplementary data for the paper  
Contact us 


Overview 

Download and install

The Amadeus software, as well as files with regulatory sequences of various organisms, can be downloaded from the Amadeus download page. After you download the zip file, please extract its contents and follow the installation instructions in the "README.txt" file.

Note that Amadeus requires Java version 1.5 (get Java here).

How to run Amadues

Following are step-by-step instructions on how to run Amadeus and analyze its output.
  Screenshot of the Amadeus input panel

  1. Choose the sequence type
    Choose the type of sequences you would like to analyze: promoters (to analyze both strands) or 3' UTRs (single strand).

  2. Set files
    Choose the organism and assign its sequences and background (BG) & target sets.
    When you choose an organism, the default sequences and BG file are those that you can download from the download page. Use the "Browse" button to select different files. Note that the sequences file should be in fasta format, in which the header of each sequence contains its name (>geneName), and the TSS of each gene is assumed to be located at the end of the sequence. Repetitive elements and other sequences you'd like to ignore in the analysis, such as protein-coding sequences, should be masked out with N's.
    The "BG file" field can be left empty when you want all the genes in the sequences file to be used as the BG set. Otherwise, the supplied BG file should contain a list of gene names, each one in a separate line. The same format applies to the target set, which should be a subset of the BG set. Typically, the BG set is much larger than the target set, e.g., the BG set are all the genes represented on the microarray, and the target set is a group of co-expressed genes.
    When using our sequences files, the gene names must be either Ensembl or Entrez gene ids.

    Important notes:
    (1) We recommend that the number of BG sequences be at least 5 times larger than the number of target-set sequences. In general, the larger the BG set is, the more chances are that Amadeus would recover the regulatory motifs. The size of the BG set is limited to roughly 65,000 sequences. Each sequence may be up to 16,000 bases long.
    (2) The BG set should be of the same nature as the target set. For example, if the target set is comprised of genes that were up-regulated in a gene expression experiment, then the BG set should contain all the genes on the expression chip.
    (3) By default, Amadeus reports motifs that occur in at most 25% of the background sequences; elements that appear more frequently are often not biologically interesting. There are additional parameters with default values that cannot be controlled via the graphical user interface. If you wish to modify the default parameters of Amadeus, or execute the program in batch (command-line) mode, please contact us.

    After all the fields have been assigned, press the "Add" button. Repeat these steps in order to add more organisms/target sets for the analysis.

  3. Set general parameters
    Running mode: choose between faster execution and more comprehensive analysis.
    From/to position: determine the range of sequences that will be scanned for the motifs.
    Known motifs DB: a file that contains PWMs in Transfac format. The motifs discovered by Amadeus are compared to the PWMs in the file, and similarities are reported. By default, Amadeus uses Transfac and miRBase for comparison in promoter and 3'-UTR analysis, respectively.
    Analyze pairs: choose whether to perform motif-pair analysis, which searches for co-occurring motifs.
    Bootstrapping: in order to empirically evaluate the significance of the motif scores, the bootstrap process repeats the entire analysis on <No. bootstrap samples> random gene sets of the same size as that of the supplied target set. Note that this process can take several hours.

  4. Select score(s) for ranking the motifs
    Each motif considered by Amadeus is evaluated using one or more scores. When several scores are chosen, you may assign them different weights. All scores are combined into a single p-value.
    Enrichment: evaluates the over-representation of the motif in the target set w.r.t. the BG set. Choose one of the variants: "hypergeometric" or "binned". The latter accounts for length and GC biases (e.g., when the target set sequences are more GC-rich than the BG sequences), as described in the paper.
    Strand bias, localization, chromosomal preference: evaluate global spatial features of the motif, namely, whether it's distributed un-evenly between the strands, along the sequences, or among the chromosomes. These scores may also be used when the target set is the entire BG set, i.e., when searching for motifs without using a subset of co-expressed genes.

  5. Start the analysis
    Click the "Run" button to start the analysis. Other buttons in the bottom panel are: Stop run, Save textual output to file, Save parameters to file, Load parameters from file.

  6. Output of Amadeus
    Amadeus has both a textual and graphical output.
    At the top of the textual "Output" tab, Amadeus reports general statistics on the supplied input, e.g, the number of BG/target-set sequences, their average length and their base frequencies. Check these stats to verify that your input was read correctly.
    Once the analysis is completed, Amadeus shows the discovered motifs in the graphical "Results" tab (see figure below).
    If pairs analysis was chosen, the results are shown in an additional tab.
    For each discovered motif, Amadeus reports its p-value (and fixed p-value, in case bootstrapping was executed), its graphical logo, the scores it attained (all scores are shown; those used for computing the p-value are marked in bold face), statistics on the number of hits and targets, and a list of similar known motifs from Transfac/miRBase ("Divergence" closer to 0 means higher similarity). Additional information is presented in several pop-up screens (see figure):
    (a) The list of k-mers that comprise the motif (i.e., pass the PWM cutoff).
    (b) A histogram of the locations of the motif's hits in the BG (red) and target set (blue) sequences. Location 0 is the TSS (or, in 3' UTR analysis, the 3' end of the sequence).
    (c) A list of genes whose promoters/3'-UTRs contain a hit of the reported motif. This list can be exported for further analysis.
    (d) The logo of the chosen PWM from Transfac.