====================================================================== PRIMA v1.0 (for Windows) PRIMA (PRomoter Integration in Microarray Analysis) is a program for finding transcription factors (TFs) whose binding sites are enriched in a given set of promoters. PRIMA is typically used for the analysis of large-scale gene expression data. Microarray ('DNA chip') measurements point to alterations in gene expression levels under varying biological conditions, but they do not, however, directly reveal the transcriptional networks that underlie the observed transcriptional modulations. PRIMA is aimed at the identification of TFs that take part in these networks. The basic biological assumption is that genes that are co-expressed over multiple biological conditions are regulated by common TFs, and therefore are expected to share common regulatory elements in their promoters. By utilizing human genomic sequences and models for binding sites (BSs) of known TFs, PRIMA identifies TFs whose BSs are significantly over-represented in a given set of promoters. This version of PRIMA is a Windows console application. Read below on how to run PRIMA. Written by Chaim Linhart, Ran Elkon and Roded Sharan under the supervision of Prof. Ron Shamir, Tel-Aviv University, ISRAEL, 2002-2003. Copyright C. Linhart, R. Elkon, R. Sharan, R. Shamir, Tel-Aviv University, 2003. ====================================================================== CONTACT For support please contact chaiml@post.tau.ac.il. ====================================================================== REQUIREMENTS Hardware requirements: Pentium-III PC, 800MHz, 256MB RAM, 300MB free disk-space. PRIMA was compiled and tested under WinNT 4.0 using MS VC++ 6.0. Some parts of PRIMA were written in Perl and compiled using the perl2exe tool (see http://www.indigostar.com/perl2exe.htm). ====================================================================== CONTENTS This zip archive contains the following files: README.txt - this file Prima_v1.0.exe - the PRIMA executable coverageProb.exe, runProfile.exe - additional executables used by PRIMA HumanPromoters13K.dat - putative promoters (1200bp upstream the TSS) for 12,981 human genes, downloaded from NCBI bg.all.13K.llid - a list of the id's of the above 13K genes, used as the background set for PRIMA runs tar.hcc.568.llid - a list of 568 genes that are periodically expressed during the human cell-cycle (Whitfield et al., 2002) tar.hcc.5_phases.568.llid - the above list partitioned into the 5 phases of the cell-cycle matrix_E2F.dat - a sample TF file with two PWMs (TRANSFAC format) output.txt - a sample output file (see "USAGE") ====================================================================== USAGE PRIMA receives three input files: (1) PWMs - transcription-factor PWMs in TRANSFAC format (a sample file, 'matrix_E2F.dat', is supplied in this zip archive); (2) BG set - a list of LLid's (LocusLink gene id's) used as the background set; (3) Target set - a list of LLid's used as the target set. The file may contain several target sets, see e.g., 'tar.hcc.5_phases.568.llid'. PRIMA reads the BG and target sets promoters from the file 'HumanPromoters13K.dat', which contains promoter sequences of length 1200bp for 12,981 human genes (data downloaded from NCBI). For each PWM in the PWMs file, PRIMA evaluates its enrichment (over-representation) in the target set by comparing the number of putative binding-sites, or hits, in the target set and in the background (BG) set. The enrichment is computed using a hyper-geometric score, and its p-value is printed. The score is accurate only if the target set is a subset of the BG set. PRIMA receives its input parameters using a Unix-style list of flags. For a description of all available command-line parameters, run: Prima_v1.0.exe -h A typical execution of PRIMA would look like this: Prima_v1.0.exe -pwm matrix_E2F.dat -bg bg.all.13K.llid -tar tar.hcc.568.llid The file 'output.txt' contains PRIMA's output for the above example. On a Pentium-4 1.4GHz PC, the above run takes approximately 20 minutes. Important notes: ---------------- * PRIMA could consume a lot of memory. It's recommended that you run it on a machine with 256MB RAM or more. * To run PRIMA, open a DOS command window, enter the folder in which you saved the PRIMA files (using the "cd" command, e.g., "cd C:\Prima"), and then run PRIMA, as described above. * PRIMA creates large temporary files during its execution, which is why you need up to 300MB of free hard-disk space. The names of these files have the ".tmp" suffix. Normally, PRIMA removes these files automatically. However, in some abnormal situations, these files might not be deleted. If you detect such files in your PRIMA folder (while you're NOT running PRIMA), delete them. * As mentioned above, the enrichment score computed by PRIMA is accurate only if the target set is a subset of the BG set. Moreover, neither set should contain a gene more than once (i.e., make sure each LLid appears only once in the target and BG sets). * The algorithm and score used in PRIMA are described in the paper cited below (Elkon et al., 2003). ====================================================================== HOW TO CITE THIS WORK Elkon, R., Linhart, C., Sharan, R., Shamir, R., and Shiloh, Y., "Genome-wide In-silico Identification of Transcriptional Regulators Controlling Cell Cycle in Human Cells", Genome Research, Vol. 13(5), pp. 773-780, 2003. ======================================================================