A filter-based feature selection using two criterion functions and evolutionary fuzzification
dc.contributor.advisor | Ohm Sornil | th |
dc.contributor.author | Jitwadee Chaiyakarn | th |
dc.date.accessioned | 2016-05-16T04:49:13Z | |
dc.date.available | 2016-05-16T04:49:13Z | |
dc.date.issued | 2013 | th |
dc.date.issuedBE | 2556 | th |
dc.description | Dissertations(Ph.D. (Computer Science)) National Institute of Development Administration, 2013. | th |
dc.description.abstract | In information age, data has become increasingly large, in both dimension (the number of features) and volume. Data mining processes, such as data classification and data clustering, performed on high dimensional data can be time-consuming and can produce poor results due to the problem so called curse of dimensionality. Feature selection is one of the fundamental techniques that selects only the most significant features and eliminates irrelevant and redundant features from the entire set of features. Filter-based feature selection is the technique to be focused in this dissertation. This technique can take less time to select significant features, especially for high dimensional data, but can not guarantee an optimal feature set. Filter-based feature selection comprises of two important parts; searching process and criterion function evaluation. Floating search is commonly used for the searching process. It is a heuristic search, which does not take much time, however, can not guarantee an optimal feature set. The latter part relies on a criterion function, which is an independent measure to evaluate and select feature subsets without actually performing data mining algorithm. Therefore, it does not inherit any bias of the data mining algorithm. Usually, only one criterion function is used so one chararteristic of data is considered at a time. In this dissertation, two criterion functions are proposed for the feature evaluation. The two functions can compliment each other and two or more characteristics of data can be considered together to effectively select features. Noise, ambiguity and uncertainty of data, which are frequently found in the real-world problem, can effect data mining process. Hence, fuzzy logic was applied to cope with these problems in this dissertation. A membership function was needed in the fuzzy logic to fuzzify original data and to infer data into fuzzy value. The fuzzy value was then passed through feature selection process instead of the original data. Genetic algorithm (GA) was used to determine the irregular shape of the membership function instead of by human expert. From the experiments, the proposed two criterion functions was found to be effective to select features that can increase accuracy of data classification. The proposed method outperforms two existing methods, the hybrid and one criterion function filter-based methods. The experimental results also show that the proposed method with fuzzy logic enhances classification accuracy. It outperforms some wrapper-based feature selection methods, which have been widely known to achieve higher accuracy than filter-based methods. The proposed feature selection method can also be used to reduce data dimension for unsupervised learning problems, such as data clustering. Unlike the supervised learning problems, there is no class label attribute of data objects to guide and cluster them into groups. Hence, it is not an easy task to select discriminant features for unsupervised learning problems. The criterion functions or measures for unsupervised learning problem were also proposed to be used for the proposed method. The experimental results showed that the proposed method can help improving clustering accuracy when compared with the results from other approaches. Therefore, the proposed feature selection method can be used for both supervised and unsupervised learning problems. | th |
dc.format.extent | 83 leaves | th |
dc.format.mimetype | application/pdf | th |
dc.identifier.doi | 10.14457/NIDA.the.2013.21 | |
dc.identifier.other | b184489 | th |
dc.identifier.uri | http://repository.nida.ac.th/handle/662723737/3027 | th |
dc.language.iso | eng | th |
dc.publisher | National Institute of Development Administration | th |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | th |
dc.subject | Evolutionary fuzzification | th |
dc.subject.other | Criterion Functions | th |
dc.title | A filter-based feature selection using two criterion functions and evolutionary fuzzification | th |
dc.type | text--thesis--doctoral thesis | th |
mods.genre | Dissertation | th |
mods.physicalLocation | National Institute of Development Administration. Library and Information Center | th |
thesis.degree.department | School of Applied Statistics | th |
thesis.degree.discipline | Computer Science | th |
thesis.degree.grantor | National Institute of Development Administration | th |
thesis.degree.level | Doctoral | th |
thesis.degree.name | Doctor of Philosophy | th |