RAP - Motif finding from protein binding microarray data

RAP (short for Rank, Align, PWM) is a software for inferring binding site motifs from protein binding microarrays (PBMs). RAP takes as input a PBM file (a list of DNA probe sequences with their binding sintensity) and outputs the binding site of the TF as a potision weight matrix. Further details on the functionality of RAP are available in the paper listed below.

RAP was developed by Yaron Orenstein with Eran Mick in Ron Shamir's Computational Genomics group at Tel Aviv University.

Get the software

Java executable distribution

This distribution is our officially supported executable for RAP. This binary is completely self-contained and should work out of the box without any issues. The package includes a small test dataset.

The software is freely available under the GNU Lesser General Public License, version 3, or any later version at your choice.

RAP is a research tool, still in the development stage. Hence, it is not presented as error-free, accurate, complete, useful, suitable for any specific application or free from any infringement of any rights. The Software is licensed AS IS, entirely at the user's own risk.

How to use it

java -jar RAP.jar <param_file> <PBM_file> <output_file>

Detailed description of parameters in file param.docx.

param_pbm.txt is an example parameters file.

PBM file can be downloaded from UNIPROBE database (http://the_brain.bwh.harvard.edu/uniprobe/).

Example run:

java -jar RAP.jar param_pbm.txt Arid3a_3875.1_v1_deBruijn.txt Arid3a_3875.1_v1.pwm

Interpreting the output

On the provided test dataset RAP's output should look like this:
A:	0.183147132396698	0.22810086607933044	0.11032721400260925	0.8135622143745422	0.9158732891082764	. . .	
C:	0.2849732041358948	0.14351284503936768	0.32613304257392883	0.028032639995217323	0.017520010471343994	. . .
G:	0.26065367460250854	0.10847616195678711	0.21236343681812286	0.11921338737010956	0.04402954876422882	. . .
T:	0.2712264955043793	0.5199114084243774	0.35117655992507935	0.03919265791773796	0.022574998438358307	. . .

Each line is of the form nucleotide: [tab] probability_pos_1 [tab] probability_pos_2 [tab] . . .

nucleotide
The line contains probabilities of this nucleotide in all positions.

probability_pos_i
The probability of the nucleotide in position i.

Citing RAP

RAP can be cited as follows:
RAP: Accurate and fast motif finding based on protein binding microarray data,
Yaron Orenstein, Eran Mick, Ron Shamir.
Journal of Computational Biology. May 2013, 20(5): 375-382.

Get in touch

In case of any questions or suggestions please feel free to contact Yaron Orenstein or Eran Mick.