Removing Redundancy from Relevant Features in Text Classification

E. Montañes, I. Díaz, E. F. Combarro, J. Ranilla.

This paper proposes a method for Feature Selection in Text Categorization. This task is performed in two steps. Firstly, an analysis of relevance is performed and after that analysis of redundancy is done. For this purpose, a range of similarity measures are adopted and converted into symmetrical ones using several aggregation operators. This fact assures that the similarity between two words are independent of the order they are considered. Several experiments over four corpora are performed, leading to conclude that this method reaches good results.

PDF full paper