Swimming in GOBS of Data
Eugene Kolker, The BIATECH Institute, Seattle
Biology is now transforming into an integrative and information-based science
that has two main characteristics: Global or high throughput (HTP) analysis
and Data integration. This presentation will focus on data analysis for
Global/HTP studies. Global genome scale studies are still at the very early
stages of their development and are therefore subject to considerable noise.
To paraphrase Ken Nealson, these studies often generate gobs of data,
where "GOBS" is an acronym for "Generation Of BS". Obtained HTP data have to
be critically re-analyzed with regard to their accuracy and utility for the
quantitative analysis. One of the main problems inherent in the majority of
such HTP data is the absence of "known" genome scale datasets that can be
used for testing and validation. We are currently developing such experimental
datasets of known standards. These complex experimental standards will enable
identification and improvement of the major bottlenecks of HTP approaches.
In addition to the experimental standards, flexible, transparent, and
statistically sound computational platforms should be developed as well.
These computational platforms will ensure implementation of new statistical
models and software tools, to serve both as the method and the means of
validation of HTP studies. Altogether, the experimental standards and
computational platforms will introduce study- (sample-)specific approaches
for Global genome scale studies, which in turn will result in deeper
biological knowledge.