An adaptive multi-level sequential floating feature selection
dc.contributor.advisor | Ohm Sornil | th |
dc.contributor.author | Knitchepon Chotchantarakun | th |
dc.date.accessioned | 2022-02-28T07:18:11Z | |
dc.date.available | 2022-02-28T07:18:11Z | |
dc.date.issued | 2020 | th |
dc.date.issuedBE | 2563 | th |
dc.description | Thesis (Ph.D. (Computer Science and Information Systems))--National Institute of Development Administration, 2020 | th |
dc.description.abstract | Dealing with a large amount of available data becomes a major challenge in data mining and machine learning. Feature selection is a significant preprocessing step for selecting the most informative features by removing irrelevant and redundant features, especially for large datasets. These selected features play an important role in information searching and enhancing the performance of machine learning models such as classification and prediction. There have been several strategies proposed in the past few decades. In this dissertation, we propose a new technique called An Adaptive Multi-level Sequential Floating Feature Selection (AMFFS). AMFFS consists of three proposed algorithms, which are One Level Forward Inclusion (OLFI), One-level Forward Multi-level Backward Selection (OFMB) and Multi-level Forward Inclusion (MLFI). Our proposed methods are considered to be deterministic algorithms related to sequential feature selection under the supervised learning model. The OFMB algorithm consists of two parts. The first part aims to create preliminarily selected subsets. These subsets have similar performance to the Improved Forward Floating Selection (IFFS). This part contains the same procedure as the OLFI algorithm. The second part provides an improvement on the previous result using the multi-level backward searching technique. The idea is to apply an improved step during the feature addition and the adaptive search method on the backtracking step. However, we need to limit the level of backwards-searching to maintain lower execution time by introducing an adaptive variable called the generalization limit. The MLFI algorithm also consists of two parts. The first part aims to search for the maximum classification accuracy by applying the multi-level forward-searching technique. The second part provides an improvement on the previous result by replacing the week feature technique. The idea is to apply an adaptive multi-level forward search method with the replacement step during the feature addition without any backtracking search. Similar to OFMB, we also need to limit the level of forward-searching by the generalization limit. In the experiments, we applied KNN, Naive Bayes, and Decision Tree for our criterion functions. We tested our algorithms on fourteen standard UCI datasets and compared their classification accuracy with other popular methods. Our proposed algorithms showed better results than the other sequential feature selection techniques for the majority of the tested datasets. The OFMB and MLFI algorithms spend more computational time than the other methods due to the complexity of the program. | th |
dc.format.extent | 87 leaves | th |
dc.format.mimetype | application/pdf | th |
dc.identifier.doi | 10.14457/NIDA.the.2020.60 | |
dc.identifier.other | b212172 | th |
dc.identifier.uri | https://repository.nida.ac.th/handle/662723737/5536 | th |
dc.language.iso | eng | th |
dc.publisher | National Institute of Development Administration | th |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | th |
dc.subject | e-Thesis | th |
dc.subject | Classification accuracy | th |
dc.subject | Feature selection | th |
dc.subject | Sequential search | th |
dc.subject.other | Dimension reduction (Statistics) | th |
dc.subject.other | Dimensional analysis | th |
dc.subject.other | Sequential analysis | th |
dc.subject.other | Data mining | th |
dc.subject.other | Supervised learning (Machine learning) | th |
dc.title | An adaptive multi-level sequential floating feature selection | th |
dc.type | text--thesis--doctoral thesis | th |
mods.genre | Dissertation | th |
mods.physicalLocation | National Institute of Development Administration. Library and Information Center | th |
thesis.degree.department | School of Applied Statistics | th |
thesis.degree.discipline | Computer Science and Information Systems | th |
thesis.degree.grantor | National Institute of Development Administration | th |
thesis.degree.level | Doctoral | th |
thesis.degree.name | Doctor of Philosophy | th |