A Tool For Reconstructing Gene Regulatory Networks

ModEnt is a computational tool that reconstructs gene regulatory networks from high throughput experimental data. The method used by ModEnt is desribed in the following paper.

Overview of ModEnt


Overview

What do I need to do in order to run the analysis on my data?

You need to prepare the input files that are described below, and put them in the same directory with the program that you downloaded.  Three of the input files are obligatory, and the fourth is optional.
The program is run from the command line. The command line arguments are also described below. 
The program output files are generated in the same directory.

Input files

There are three obligatory input files that you need to prepare in order to run the analysis.

The first file is the chip features file.  This files provides information about the different experiments.  It has a simple tab-delimited format.  Name this file "Features.txt".
The second file is the regulators file.  The regulators file is a list of ther names of all the regulators that you want to consider.  If every gene is a potential regulator, the list should contain all the genes.  Name this file "Regulators.txt".
The third file is the expression file.  It contains a table of the expression values of all the genes over all the experiments.  Each column corresponds to a different gene, and each row to a different experiment.  Name this file "ExpressionData.txt".
The fourth, and optional file, is the knowledge file. It contains a-priori knowledge that you have about the network. Name this file Knowledge.txt
Following are descriptions of the input files and some samples.

The Chip Features File

This file contains a description of all the chips in the dataset.

The information that should be provided in the features file is as follows:

Experiment: The experiment number

Conditions: This practically means experimental conditions. Multiple values should be separated by commas. If these are not available use the notation NA instead. See also the example table.

Condition levels: This strength of the different conditions. Multiple values should be separated by commas. (values between 0.0 and 1.0). If these are not available use the notation NA instead. See the example table.

Strain: The strain of cells that were used in the experiment. If these are not available use the notation NA instead. See the example table.

Deleted Genes: The names of the genes that were deleted in the experiment. Multiple values should be separated by commas. If these are not available use the notation NA instead. See the example table.

Overexpressed Genes: The names of the genes that were over-expressed in the experiment. Multiple values should be separated by commas. If these are not available use the notation NA instead. See the example table.

Time: If part of a time-series, the time of the experiment. Note that time-series experiments should be listed consecutively. If these are not available use the notation NA instead. See the example table.

As mentioed above, in order to indicate that a field is not available, use the notation NA.  If you use NA in the time-series column, it means that the experiment is not part of a time-series.

The following table provides an example:
#Experiment Conditions ConditionLevels Strain DeletedGenes OverexpressedGenes Time
1 NA NA NA NA NA NA
2
NA NA NA NA NA NA
3
NA NA NA NA NA NA
4
C1 0.5 NA NA NA NA
5
C1 1.0 NA NA NA NA
6
NA NA NA NA NA 0
7
NA NA NA NA NA 30
8
NA NA NA NA NA 60
9
NA NA NA G5 NA NA
10
NA NA NA G7 NA NA
11
NA NA NA G5,G8 NA NA
12
C2,C3 NA NA NA G4 NA
13
C2,C3 NA S1
NA G4 NA
Row 1 means that in the first experiment we do not provide information about experimental conditions or the strain, none of the genes were overexpressed or deleted, and the chip was not a part of time-series.
Row 5 means that in the fifth experiment the experimental condition was the same as in the fourth experiment but at double intensity.
Row 7 means that the seventh experiment was the second time point in a time-series that started in the experiment of row 6.
Row 9 means that in the ninth experiment the gene called "G5" was deleted.
An example for a features file can be found here.


The Regulators File

This file contains a list of all the genes that may regulate other genes.  Make sure that the names of the genes that are listed as regulators are identical to the names that you use in the features file and the expression file.
An example for a regulators file can be found here.

The Expression File

The expression file contains the microarray expression values for all the genes.  Make sure that the names of the genes that are listed are identical to the names that you use in the features file and the regulators file.
An example for an expression file can be found here.

The Knowledge File

This input file is optional. It contains regulator-gene connections that are retrieved from the literature or from other types of experimental data such as ChIP-chip.
These connections are given as tab-delimited triplets: Regulator Gene Confidence
An example for an expression file can be found here.
If you use a knowledge file, name it Knowledge.txt.

Command line arguments

The program can take the following command line arguments:

-f : This will run a faster algorithm that may be less accurate but will reduce running time significantly. Recommended for large networks.
-r : Change the default maximal number of regulators per gene. The default number is 3. For example, using -r 10 will allow every gene to have up to 10 regulators.
-l : If you want the program to find the logic tables that describe how the regulators of each gene affect it, use this option. This option increases the running time.
-c : Clusters together all the genes that have the same regulators.
-k : Use a given network topology, and compute only the logic tables (in this case you need to provide a
knowledge file). Use this option in addition to the -l option.

Output files

The Results File

The results file contains lines of the following format:

Regulator    Regulatee    pp

where Reglator is a name of a gene from the regulators file, and Regulatee is any other gene.  Such a triplet means that Regulator is predicted to regulate Regulatee.
The "pp" is added for visualization purposes - it allows you to view all the predicted regulations as a network using Cytoscape.
The file name of the results file is Results.txt

The Logic File

This file gives the logic tables for every gene in the network. It is only generated if you use the -l option.
For example, if the results file contains the following regulators(in this order) for Gene1:
Regulator1 Gene1 pp
Regulator2 Gene1 pp
and the logic file shows the table:
Gene1
0 0 | 0
1 0 | 1
0 1 | 1
1 1 | 1
The first column corresponds to the level of Regulator1, the second column to the level of Regulator2, and the third column (after the |) to the level of Gene1.
In this example when either of Regulator1 or Regulator2 has level 1, the level of Gene1 changes to 1. When both of these regulators have level 0, then the level of Gene1 will also be 0.
The file name of the logic file is Logic.txt.

The Cohorts File

This file lists groups of genes ("cohorts") that have the exact same regulator set. It is only generated if you use the -c option.
The file name of the cohorts file is Cohorts.txt

Reference:

Karlebach, G. and Shamir, R., Constructing logical models of gene regulatory networks by integrating transcription factor-DNA interactions with expression data: an entropy based approach. Journal of Computational Biology. In Press. PDF Supplementary Material

Download the program

License Agreement:


Contact Us

If you have any questions, don't hesitate to contact us at

ModEnt was developed by Guy Karlebach at Ron Shamir's Computational Genomics Group , Tel Aviv University.