Using Microarrays to
Microdissect Cancer Genomes
Zohar Yakhini, Agilent Laboratories and the Technion
Alterations in DNA copy numbers are associated with a variety of
diseases and conditions, mostly cancer. Comparative genomic
hybridization (CGH) is a
technique that is used to evaluate variations in genomic copy number in
cells. Recent technology developments introduced an oligonucleotide
array
platform for array based comparative genomic hybridization (aCGH)
analyses (Barrett et al, PNAS 2004; Brennan et al, Cancer Research
2004; Sebat et al,
Science 2004). This platform provides increased resolution in
determining the boundaries of measured genome alterations.
To fully facilitate the use of CGH instrument data in identifying
aberrations in tumor cells and to leverage the measurement to gain
better
understanding of cancer development and progression we develop
statistics, algorithmics, software and visualization tools.
In the talk I will review the biological background of cancer genome
instabilities, describe the state of the art measurement technologies
and discuss
in details data analysis goals, tasks and solutions.
In particular, we will discuss a statistical framework that enables the
casting of several DNA copy number data analysis questions as
optimization problems over real valued vectors of signals. The
simplest form of the optimization problem seeks to maximize $\score{I}
= \sum v_i / \sqrt{|I|}$ over all subintervals $I$ in the input
vector. We will present and prove a linear time approximation scheme
for this problem. Namely, a process with time complexity
$O\left(n\epsilon^{-2}\right)$ that outputs an interval for which
$\score{I}$ is at least ${\mbox{Opt}}/\alpha(\epsilon)$, where
$\mbox{Opt}$ is the actual optimum and
$\alpha(\epsilon)\rightarrow 0$ as $\epsilon \rightarrow 0$. We will
further describe practical implementations that improve the
performance of the naive quadratic approach by orders of
magnitude.