The last decade has seen an explosion of large-scale biological data: The human genome was sequenced, along with numerous other species, and we are approaching the era where personal genomes will be sequenced for broad clinical use. DNA microarrays are used to measure gene expression levels under a variety of conditions, and more than half a million microarray profiles are available today. Next generation sequencing techniques provide fast and cheap measurements of a plethora of biological entities, at a rapidly decreasing cost (a recent report says sequencing cost drops by 50% every five months).
We have been developing methods for analysis of large-scale gene expression data. Grouping the data into modules is a key step for such analysis, and we have developed-among other methods-the Click clustering algorithm, the Samba algorithm for biclustering, and the Matisse method that finds modules using expression and protein interaction networks. Our "flagship" is the Expander platform, which incorporates these algorithms and many others into a streamlined, user-friendly analysis.
The tools are developed in close collaboration with experimentalists, and are in broad use by the community for a variety of projects and species. Our own collaborations have included, among others, human DNA damage, immune system and cell cycle, yeast genomics, human embryonic stem cells, and human pathogens.
Method development is ongoing and is being adapted to use new data types (e.g. genetic interactions, next-gen sequencing) and to answer new challenging biomedical problems as they arise.