Clique graph projection and clustering in gene expression data.

Zohar Yakhini , Zohar Yakhini Chemical and Biological Systems Department Agilent Laboratories, a HP subsidiary.

Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this work we model this clustering problem by a projection problem on graphs. Namely: for a given graph G(V,E), find a graph Q over V which is a disjoint union of cliques and closest to G under this constraint. We describe an algorithmic solution to this problem. We define an appropriate stochastic model on the input; an underlying clique graph is contaminated by randomly bringing edges up and down. Under the conditions of the model, the algorithm recovers the underlying clique graph with high probability. The running time of the algorithm on an $n$-gene dataset is $O(n \log(n))$. Time permitting, proofs for these claims will be given in the talk. A practical heuristic based on the same algorithmic ideas will also be discussed. Actual gene expression data examples and results from simulations will be presented. This is joint work with Amir Ben-Dor, Ron Shamir, and others.