The Biozon system for complex analysis of heterogeneous interrelated
biological data and discovery of emergent structures
Golan Yona, Department of Computer Science, Technion
The Biozon system (biozon.org) is a knowledge
resource of heterogeneous biological data. Informally, Biozon can be
described as Amazon and Google, combined together and applied to the
diverse biological knowledge domain.
This resource merges the holdings of more than a dozen molecular biology
collections, including SwissProt, KEGG, PDB, BIND, and others, and
augments this data with novel in-house derived data such as sequence
or structure similarity, predicted interactions, and predicted
domains. Currently, Biozon holds more than 90 million biological
documents and 2.5 billion relations between them.
Biozon allows complex searches on the data graph that specify desired
interrelationships between types (for example, 3D structures for all
proteins that interact with the protein BRCA1). Moreover, Biozon has
a fuzzy searches engine that extends complex searches to include
homologous sequences or structures as a search step, or even genes
with similar expression profiles. One can search, for example, for
all proteins that are known to take part in a specific pathway or
proteins with similar expression profiles (associated with the
corresponding mRNA sequences) to these proteins. Biozon also
integrates first-of-a-kind biological ranking system which resembles
the methods implemented in Google.
Biozon is linked to other research projects in my
lab, such as pathway prediction, domain-based protein hierarchy,
detection of semantically significant domain architectures and novel
embedding techniques that we have developed to construct a complete
"road map" of the protein universe.