Before putting our analysis into perspective with respect to some of the related works, in order to prevent misleading interpretations, it is important to draw the attention to the fact that the problem of clustering cancer gene expression data (tissues) is very different from that of clustering genes.

In terms of works that do develop an experimental analysis, [-means, mixture of Gaussians, density-based clustering and farthest first traversal algorithm) applied to eight binary class data sets.

Based on their experimental results obtained with the clustering methods, the authors do not suggest or indicate the suitability/preference of any of these methods with respect to the others.

In contrast to the work in [], as previously pointed out, our paper presents a first large scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets.

While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods.

There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context.

Such researchers often perform an evaluation of their methods using available public data previously published in clinical studies.

One thought on “validating clustering for gene expression data bioinformatics”

