Evaluation of Gene Expressions Clusters via Information Theoretic Measure
Irad Ben-Gal and Ido Priness
Dep. of Industrial Engineering
TAU
A large body of literature addresses the expression profiles of genes under
various conditions or in various sample types. A vital step for the analysis
of gene expression data is the identification of clusters that manifest
similar expression patterns. Yet, little research has been devoted to
evaluate the adequacy of different performance measures that are used to
assess the quality of the clustering solution. We propose a measure for the
Homogeneity and the Separation of clusters that is based on mutual
information metrics. We test the proposed measure on several public
gene-expression datasets. Our results show that the information-based
measure outperforms commonly used performance measures that rely on Pearson
correlation or Euclidean distance. The proposed measure yields a more
significant differentiation among supervised clusters with different numbers
of mismatches. Next, we use the new measure to compare known clustering
algorithms that were applied to unsupervised clustering problems.