Authors: J. Salvador Sánchez and Vicente GarcíaIn gene-expression microarray data sets each sample is defined by hundreds or thousands of measurements. Highdimensionality data spaces have been reported as a significant obstacle to apply machine learning algorithms, owing to the associated phenomenon called ‘curse of dimensionality’. The analysis and interpretation of these data sets have been defined as a very challenging problem. The hypothesis proposed in this paper is that there may exist some correlation between dimensionality and the types of samples (safe, borderline, rare and outlier). To examine our hypothesis, we have carried out a series of experiments over four gene-expression microarray databases because these data correspond to a typical example of the so-called ‘curse of dimensionality’ phenomenon. The results show that there indeed exist meaningful relationships between dimensionality and the proportion of each type of samples, demonstrating that the amount of safe samples increases and the total number of borderline samples decreases as dimensionality of the feature space decreases.
Add to my calendar
Create your personal schedule through the official app, Whova!Get Started