- a software for promoter analysis
Tel-Aviv University, Jan. 2003.
Last updated on Aug. 2004.
PRIMA (PRomoter Integration in Microarray Analysis) is a program for finding transcription factors (TFs) whose binding sites are enriched in a given set of promoters. PRIMA is typically used for the analysis of large-scale gene expression data. Microarray ('DNA chip') measurements point to alterations in gene expression levels under varying biological conditions, but they do not, however, directly reveal the transcriptional networks that underlie the observed transcriptional modulations. PRIMA is aimed at the identification of TFs that take part in these networks. The basic biological assumption is that genes that are co-expressed over multiple biological conditions are regulated by common TFs, and therefore are expected to share common regulatory elements in their promoters. By utilizing human genomic sequences and models for binding sites (BSs) of known TFs, PRIMA identifies TFs whose BSs are significantly over-represented in a given set of promoters.
New: Please read the Prima updates.
PRIMA requires two data collections:
We constructed a set of putative promoters of known human genes by extracting sequences from the human genome that correspond to 1200 bp upstream of the genes' putative transcription start sites (TSSs) based on genes' start annotations (Human genome was downloaded from NCBI on July 2001). Human repetitive sequences are masked. The set contains putative promoters for 12981 human genes
(we call it the '13K set').
The 13K set can be downloaded here (4MB, gzipped).
||Models for BSs recognized by TFs.
PRIMA uses the commonly used position weight matrices (PWMs) models for modeling binding sites recognized by TFs. In our analysis PWMs were obtained from the TRANSFAC database .
PRIMA gets as input two sets of genes: a target set (e.g., a list of co-expressed genes
found in a microarray experiment) and a background set (e.g., the 13K set),
and for each PWM P it performs the following steps:
Compute a similarity threshold T(P). Subsequences in the scanned promoters with similarity scores above this threshold are considered as 'hits' of P (i.e., putative binding sites of the TF modeled by the PWM).
||Scan the promoters of the target and the background sets for identification of
hits of P.
||Employ a statistical test to examine whether hits of P are significantly over-represented in the target set with respect to the background set.
The full details of the algorithm and the relevant computational analysis are
described in .
PRIMA is written in Perl and C, and runs under Windows and Linux.
See the "README.txt" file provided with the software.
A sample output file can be viewed here.
We demonstrated the utility of PRIMA in deciphering regulatory mechanisms that control gene expression in . In this study we analyzed the human cell cycle dataset published by , which recorded genome-wide gene expression levels over multiple time points during the progression of cell cycle in HeLa human cell line. PRIMA revealed 8 TFs whose binding sites were significantly over-represented in the promoters of cell cycle-regulated genes. The enrichment of some
of these factors was specific to certain phases of the cell cycle.
The eight circles in the Figure below correspond to the TFs that were highly enriched
in promoters of cell cycle-regulated genes. Each circle is divided into 5
zones, corresponding to cell cycle phases. The number adjacent to the zone
represents the ratio of the TF's hits prevalence in promoters contained in
each of the cell cycle phase clusters to their prevalence in the set of 13K
background promoters. Note that several TFs show a tendency towards specific
cell cycle phases: e.g., over-representation of the E2F PWM in promoters of
the G1/S and S clusters, and its under-representation in promoters of the
A new version of Prima is integrated in the
software (see also Updates), which is freely available for academic use.
The standalone version of PRIMA is freely available for academic use under the following
It is also available for non-academic use under appropriate licensing.
Please contact Ron Shamir or
Chaim Linhart for further information.
Aug. '04: New promoter sequences (from 1000 bp upstream to 200 bp downstream the TSS, repetitive sequences were masked out),
downloaded from Ensembl :
HumanPromoters_v19.txt.zip - 19,565 human promoters, Ensembl release 19.34b.
MousePromoters_v19.txt.zip - 20,028 mouse promoters, Ensembl release 19.32.
In order to run Prima on these promoters, please download the
Oct. '03: A new version of Prima which utilizes
precomputed fingerprint files
(for both Human and Mouse), and is, therefore, much faster, is now available as part of the
The fingerprint of a gene is the number of hits (putative binding-sites) of the various TFs
that were identified in its promoter. The standalone version of PRIMA recomputes the
fingerprints in each execution. While this allows more flexibility (e.g., in choosing
the thresholds for declaring hits), this process is very time consuming.
EXPANDER, on the other hand, executes PRIMA on a fixed set of precomputed fingerprints,
which were constructed as follows:
A set of about 17,000 human promoter sequences, spanning from 1000 bp upstream the
TSS to 200 bp downstream the TSS, was scanned in order to locate putative BSs (hits).
The scan was performed for each TF motif (PWM) in TRANSFAC (version 5.4, April '02) 
that corresponds to a Human TF.
The information on the number of hits of each PWM in a promoter is
called the fingerprint of that promoter.
The fingerprints of all human promoters are supplied with EXPANDER.
The human promoter sequences were downloaded from Ensembl (release 13.30) .
Another set of fingerprints was prepared on mouse promoters (15,000 promoters, Ensembl release 13.30).
For most users we recommend using EXPANDER, both for promoter analysis
and other computational and visualization tasks.
PRIMA is accessible via the "Made In Israel" bioinformatics portal.
||Elkon, R., Linhart, C., Sharan, R., Shamir, R., and Shiloh, Y.,
"Genome-wide In-silico Identification of Transcriptional Regulators Controlling
Cell Cycle in Human Cells",
Genome Research, Vol. 13(5), pp. 773-780, 2003.
||Whitfield, M.L., G. Sherlock, A.J. Saldanha, J.I. Murray, C.A. Ball, K.E. Alexander, J.C. Matese, C.M. Perou, M.M. Hurt, P.O. Brown, and D. Botstein,
"Identification of genes periodically expressed in the human cell cycle
and their expression in tumors",
Mol Biol Cell, Vol. 13, pp. 1977-2000, 2002.
||Wingender, E., X. Chen, R. Hehl, H. Karas, I. Liebich, V. Matys, T. Meinhardt, M. Pruss, I. Reuter, and F. Schacherer,
"TRANSFAC: an integrated system for gene expression regulation",
Nucleic Acids Res, Vol. 28, pp. 316-319, 2000.
||EXPANDER - A Gene Expression Analysis and Visualization Software -
||The Ensembl Project -
This page was visited
times since Apr 24 2003.
Powered by counter.bloke.com