Lowering Uncertainty of Cancer Classification

Oleg Okun, Helen Priisalu

A new ensemble scheme is proposed for classifying high dimensional data, which exploits dependence between data complexity, determining how difficult to classify a given dataset, and classification error. As a classification task, gene expression based cancer classification is studied, with a k-nearest neighbor as a base classifier. Experiments carried out on five datasets show the importance of taking into account dataset complexity when constructing ensembles of nearest neighbors.

PDF full paper