Prev   Next   Top

Preprocessing GE Data:

The following preprocessing schemes can be performed using EXPANDER:

1) Flooring:  setting all expression values that are bellow a certain threshold (set by the user) into that threshold. This can be performed through the Advanced Data Load dialog box, and is available only for oligonucleotide data. 

2) Normalization: required in order to remove systematic variation, i.e. variation arising from reasons other than biological     differences between RNA samples.

Expander performs normalization only for oligonucleotide data, since it is assumed that the cDNA microarray data is already normalized, as it is input after performing log ratio (log2R/G).

Normalization can be performed using the following schemes:

Quantile normalization (Preprocessing >> Normalization >> Quantile), in which the whole data is used.

Non linear baseline normalization (Preprocessing >> Normalization >> Non Linear Baseline), which uses a baseline array (can be selected by the user). In this scheme a normalization function is calculated using pseudo Loess regression of the M vs. A scatter plot. The subset of genes that are used to evaluate the normalization function can be set to “all genes” (recommended when most genes in the dataset are expected to be constantly expressed) or a “rank invariant set” of genes (recommended when there can be a large number of differentially expressed genes).

For more details regarding the normalization schemes see the References section.

3) Condition filtration: the conditions used in the analysis can be manually filtered by selecting Preprocessing>>Filter Conditions. This will bring up a dialog box in which the user can select the required conditions from a list.

4) Gene filtration:  can be performed in order to filter out some of the constantly expressed genes, and perform downstream analysis on a smaller informative subset of the genes.

Data Filtration can be performed using the following schemes:

Fold Change (Preprocessing >> Filter Probes >> Fold Change) -  when using this method only genes that are over/under expressed by at least n fold in at least k arrays are selected (n and k are determined by the user). The fold change can be calculated in relation to (a) a selected baseline array (b) the minimal expression value of the gene OR (c) the reference value when working on cDNA microarrays (depending on the user’s selection).

Variation (Preprocessing >> Filter Probes >> Variation) - when using this method the k most variant genes are selected (k is determined by the user). Variance is used to measure variation for cDNA Microarray data, and Coefficient of Variation is used to measure variation for oligonucleotide data.

 5) Standardization: When expression values between different genes are very different, but general expression patterns are similar (high Pearson Correlation values), we would expect to see this similarity when looking on a pattern display. Since the absolute values of expression are different, a manipulation is required, in order to view the patterns on the same scale. This manipulation is called standardization.

Standardization can be performed using the following schemes (Preprocessing >>  Standardization):

Mean 0 and Variance 1 – normalizes each expression pattern to have a mean of 0 and a variance of 1. This method is appropriate in most cases when working on genes.

Fixed norm - normalizes each expression pattern to have a fixed norm i.e. expression levels are divided by the norm of that expression vector (the root of sum of squares of that vector). This method is appropriate when different mean values or variances are expected for different patterns (e.g. when working on conditions and expecting larger variance in later phases of a response.

 

After performing a preprocessing operation, the information regarding the operation is added to the “Preprocessed Data” section in the “Session Data” tab. In addition, the “Preprocessed Data box plot” and “Preprocessed Expression Matrix” tab are automatically updated according to the new values in the dataset.  

 

Upon selecting Preprocessing>>Undo the data is changed to be as it was before the most recent preprocessing operation was performed, and the corresponding information is removed from the “Preprocessed Data” section. The “Preprocessed Data box plot” and “Preprocessed Expression Matrix” tab are automatically updated accordingly.

All the above operations can be performed before running further analysis on the data and generating displays.  When attempting to perform further preprocessing operations after analysis results and visualizations have been generated, the following dialog box appears:

Upon choosing to open an additional data sheet, a new data set view tab called “Data Sheet 2” is added to the main frame. The title of this tab is highlighted (colored in purple), indicating that it is now the active data sheet (i.e. all further operations refer to this data sheet). The active data sheet is automatically changed according to the selected (front) visualization tab.

Preprocessed gene expression data can be saved to a file at any time be selecting Preprocessing>>Save Preprocessed Data. The data is written in the same format defined for input GE data.

 



Prev   Next   Top