The following key algorithms
developed in our group are integrated into Expander:
Algorithms marked with * are avialable also as stand-alone command line versions.
CLICK*:
CLICK is a novel clustering
algorithm which is applicable to gene expression analysis as well as to other
biological applications. No prior assumptions are made on the structure or the
number of the clusters. The algorithm utilizes graph-theoretic and statistical
techniques to identify tight groups of highly similar elements (kernels), which
are likely to belong to the same true cluster.
SAMBA*:
SAMBA is a novel biclustering algorithm for the identification of modules
of genes that exhibit similar behaviour under a subset of the examined
biological conditions.
SAMBA is an efficient way to discover
statistically significant biclusters in large scale biological datasets,
consisting of hundreds or thousands of diverse
experiments.
It
extends the standard clustering approach by detecting
subtle similarities between genes across subsets of the measured conditions and
enabling
genes to participate in several biclusters.
Thus, it is more suitable for analyzing heterogeneous datasets.
TANGO*:
TANGO tests whether the group of genes in each cluster
is enriched for a particular function. The functions of the genes are
determined according to GO annotation files. Since the GO functions are highly
related, TANGO performs hyper-geometric enrichment tests and corrects for
multiple testing by bootstrapping and estimating the empirical p-value
distribution for the evaluated sets.
PRIMA:
PRIMA
(PRomoter Integration in Microarray Analysis) is a program for finding
transcription factors (TFs) whose binding sites are enriched in a given set of
promoters. After identifying a group of co-regulated genes using clustering or
biclustering, the promoters of the genes can be analyzed using PRIMA. By
utilizing known models for binding sites (BSs) of TFs, PRIMA identifies TFs
whose BSs are significantly over-represented in that set of promoters. Such TFs
are candidate regulators of the corresponding set of genes.
FAME:
FAME is an algorithm
which performs empirical tests using a sampling technique (random permutations)
to estimate whether the group is enriched or depleted with the targets of some
miRNA families. This is done while accounting for biases in the 3'
UTR sequences.
MATISSE*:
MATISSE (Module Analysis via
Topology of Interactions and Similarity SEts) is a program for detection of
functional modules using interaction networks and expression data. A functional
module is a group of genes that form a connected component in a protein
interaction network and have similar gene expression patterns.
DEGAS*:
DEGAS (DysrEgulated Gene set Analysis via Subnetworks) is a method for
identifying connected gene subnetworks significantly enriched for genes that are dysregulated
in specimens of a disease. DEGAS receives as input expression profiles of the disease patients
and of controls and a global network. The subnetworks identified by DEGAS can provide a signature of the
disease potentially useful for diagnosis, pinpoint possible pathways affected by the disease, and suggest
targets for drug intervention.