Clique graph projection and clustering in gene expression data.
Zohar Yakhini
,
Zohar Yakhini
Chemical and Biological Systems Department
Agilent Laboratories, a HP subsidiary.
Recent advances in biotechnology allow researchers to measure
expression levels for thousands of genes simultaneously, across
different conditions and over time. Analysis of data produced by
such experiments offers potential insight into gene function and
regulatory mechanisms. A key step in the analysis of gene
expression data is the detection of groups of genes that manifest
similar expression patterns.
In this work we model this clustering problem by a projection problem on
graphs. Namely: for a given graph G(V,E), find a graph Q over V which is a
disjoint union of cliques and closest to G under this constraint. We
describe an algorithmic solution to this problem. We define an appropriate
stochastic model on the input; an underlying clique graph is contaminated by
randomly bringing edges up and down. Under the conditions of the model, the
algorithm recovers the underlying clique graph with high probability. The
running time of the algorithm on an $n$-gene dataset is $O(n \log(n))$. Time
permitting, proofs for these claims will be given in the talk.
A practical heuristic based on the same algorithmic ideas will also be
discussed. Actual gene expression data examples and results from simulations
will be presented.
This is joint work with Amir Ben-Dor, Ron Shamir, and others.