Finding interesting patterns in microarray data
Teng Li, Lynn
Research Associate,
Department of Computer Science and Engineering,
The Chinese University of Hong Kong
Abstract:
Expression patterns in microarray data provide clues of gene functions, cell types, and interactions among genes or gene products. Interesting expression patterns in microarray data could be any non-random patterns appearing with significant frequencies or patterns appearing special trends. An intuitive start of microarray data analysis is to use clustering to group together genes (or samples) with similar expression patterns. However, applying clustering algorithms to microarray data runs into a significant difficult when the patterns may not apparent in a global way.
Firstly, we proposed algorithms to find local patterns in microarray data. The local patterns, or called biclusters, involve only a subset of genes and a subset of samples. Structure of a possible bicluster can take different forms. ISWCC (Iteratively Sorting with Weighed Correlation Coefficient) was proposed to find biclusters with coherent values. It applies dominant set approach to create sets of sorting vectors for rows of the data matrix. The co-expressed rows of the data matrix are gathered after sorting. By alternatively sorting and transposing the data matrix the blocks of co-expressed subset are located in the corner of the data matrix. Weighted correlation coefficient is used to measure the similarity in the gene level and the sample level. Their weights are updated each time using the sorting vector of the previous iteration. We then proposed GPF (Growing Prefix and Suffix) and GFP (Growing Frequent Position) to find biclusters with coherent evolutions, more specific, the order preserving patterns. The proposed heuristic algorithms output significant biclusters more efficiently and have lower space and computation cost. The proposed algorithms were also compared with some typical biclustering methods with respect to their capability of identifying biclusters of different models; sensitivity to parameter settings and resistance to noise. Some interesting conclusions are discovered.
We proposed the ideal of DDP (Discovering Distinct Patterns) in microarray data, which is to find the genes which have significantly different patters. DDP is useful to scale-down the analysis when there is little prior knowledge. A DDP algorithm is proposed by iteratively picking out pairs of genes with the largest dissimilarities. Experiments are implemented on both synthetic data sets and real microarray data. The results show the effectiveness and efficiency in finding functional significant genes. The usefulness of genes with distinct patterns for constructing simplified gene regulatory network is further discussed.