The Biozon system for complex analysis of heterogeneous interrelated biological data and discovery of emergent structures

Golan Yona, Department of Computer Science, Technion

The Biozon system (biozon.org) is a knowledge resource of heterogeneous biological data. Informally, Biozon can be described as Amazon and Google, combined together and applied to the diverse biological knowledge domain.
This resource merges the holdings of more than a dozen molecular biology collections, including SwissProt, KEGG, PDB, BIND, and others, and augments this data with novel in-house derived data such as sequence or structure similarity, predicted interactions, and predicted domains. Currently, Biozon holds more than 90 million biological documents and 2.5 billion relations between them.
Biozon allows complex searches on the data graph that specify desired interrelationships between types (for example, 3D structures for all proteins that interact with the protein BRCA1). Moreover, Biozon has a fuzzy searches engine that extends complex searches to include homologous sequences or structures as a search step, or even genes with similar expression profiles. One can search, for example, for all proteins that are known to take part in a specific pathway or proteins with similar expression profiles (associated with the corresponding mRNA sequences) to these proteins. Biozon also integrates first-of-a-kind biological ranking system which resembles the methods implemented in Google.
Biozon is linked to other research projects in my lab, such as pathway prediction, domain-based protein hierarchy, detection of semantically significant domain architectures and novel embedding techniques that we have developed to construct a complete "road map" of the protein universe.