Swimming in GOBS of Data

Eugene Kolker, The BIATECH Institute, Seattle

Biology is now transforming into an integrative and information-based science that has two main characteristics: Global or high throughput (HTP) analysis and Data integration. This presentation will focus on data analysis for Global/HTP studies. Global genome scale studies are still at the very early stages of their development and are therefore subject to considerable noise. To paraphrase Ken Nealson, these studies often generate gobs of data, where "GOBS" is an acronym for "Generation Of BS". Obtained HTP data have to be critically re-analyzed with regard to their accuracy and utility for the quantitative analysis. One of the main problems inherent in the majority of such HTP data is the absence of "known" genome scale datasets that can be used for testing and validation. We are currently developing such experimental datasets of known standards. These complex experimental standards will enable identification and improvement of the major bottlenecks of HTP approaches. In addition to the experimental standards, flexible, transparent, and statistically sound computational platforms should be developed as well. These computational platforms will ensure implementation of new statistical models and software tools, to serve both as the method and the means of validation of HTP studies. Altogether, the experimental standards and computational platforms will introduce study- (sample-)specific approaches for Global genome scale studies, which in turn will result in deeper biological knowledge.