Prev   Next   Top

File Formats

Raw Data file formats:

Expression file:

1) Suffix: no limitations. 
2) Separating token: tab delimiter.
3) Format:

1st line: contains the string "probeId" and a tab delimiter, followed by the string "geneSymbol" and a tab delimiter, followed by the names of all conditions separated by tab delimiters. Each condition name can appear either as a stand-alone name or in the format of <category name>/<condition name>. In the second case condition categories will be used as condition classifications by Expander.

Next lines: Each subsequent line consists of the probe ID (an identifier string that is unique to each probe in the chip), followed by a string, which represents the gene full name (if missing can be left empty by adding an additional tab delimiter), followed by its expression values (all tab delimited). If the expression file contains missing values, Expander will replace them with 0 values incase this is Oligonucleotide array data. Expander currently does not deal with missing values for cDNA microarray data.

*For example see files “expressionData1.txt” and “expressionData2.txt” in the Expander/sample_input_files/ directory.

If the data is not in the above format, it may be possible to load it using the “Advanced” dialog box, which appears upon pressing the “Advanced” button in the Expression Data load dialog box (see Loading Input Data).

Gene Sets file:

1) Suffix: no limitations

2) Format: Each line contains a gene ID, a gene symbol (optional) and the number of its set. Each field in the line should be tab separated from the previous field.
The gene IDs are expected to be of the same convention used in the GO annotation and TF fingerprint files.  For details regarding the Gene ID convention that is used for each organism, refer to the Supplied files section.

*For example see file “geneSetsData1.txt” under the Expander/sample_input_files/ directory (see Sample input files for more details).

ID conversion file format:

Suffix: Currently, there are no limitations regarding the file name suffix.

Format: Each line contains the probe id as it appears in the data file, a tab separator and the corresponding gene ID (e.g. Entrez/Locus-Link ids for mouse and human genes and ORF codes for yeast).  The second field can be left blank, indicating no conversion for that probe ID.
* It is possible that several probe IDs in the data file will be mapped to the same gene ID (e.g.: several ESTs from the same gene).

Clustering files format:

1) Suffix: no limitations.

2) Format: Each line contains the probeID, a tab separator and the number of its cluster.
Cluster number 0 is reserved for genes that are left unclustered. The file does not have to contain all genes in the data. If a gene does not appear in the file, it is automatically set as unclustered.  

*For example see file “expressionData1Clustering.sol” (a clustering solution for the data file” expressionData1.txt”) under the Expander/sample_input_files/ directory (see Sample input files section for more details).

Biclustering files format:

1) Suffix: `.bic`.
2) Format: the file is composed of two parts, presented here.

Part 1 presents a summery of the biclusters found.

Part 2 presents the probesets and the conditions contained in each bicluster.

Background set files format:

1) Suffix: no limitation.

2) Format: each line should contain one gene ID. The gene IDs are expected to be of the same convention used in the annotation and TF fingerprint files for the organism you are working on (please refer to the Supplied files section).



Prev   Next   Top