A Novel Feature Selection Technique for Text Classification Using NaÃ¯ve Bayes.

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. NaÃ¯ve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naÃ¯ve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naÃ¯ve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naÃ¯ve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naÃ¯ve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

期刊：	Int Sch Res Notices	影响因子：	0.000
时间：	2014	起止号：	2014 Oct 28; 2014:717092
doi：	10.1155/2014/717092

A Novel Feature Selection Technique for Text Classification Using NaÃ¯ve Bayes.

一种基于朴素贝叶斯的文本分类新型特征选择技术

特别声明