Evaluation of Gene Expressions Clusters via Information Theoretic Measure

Irad Ben-Gal and Ido Priness
Dep. of Industrial Engineering
TAU

A large body of literature addresses the expression profiles of genes under various conditions or in various sample types. A vital step for the analysis of gene expression data is the identification of clusters that manifest similar expression patterns. Yet, little research has been devoted to evaluate the adequacy of different performance measures that are used to assess the quality of the clustering solution. We propose a measure for the Homogeneity and the Separation of clusters that is based on mutual information metrics. We test the proposed measure on several public gene-expression datasets. Our results show that the information-based measure outperforms commonly used performance measures that rely on Pearson correlation or Euclidean distance. The proposed measure yields a more significant differentiation among supervised clusters with different numbers of mismatches. Next, we use the new measure to compare known clustering algorithms that were applied to unsupervised clustering problems.