Feature selection using genetic algorithm

Ohm SornilKanyanut Homsapaya2022-06-092022-06-092017b201179https://repository.nida.ac.th/handle/662723737/5874In this dissertation, a method of feature selection in machine learning, and more particularly supervised learning is presented. Supervised learning is a machine learning task that infers answers from a training data set. In machine learning, training datasets are employed in order to create a model which enables reasonable predictions, while in supervised learning, each training example is a training set consisting of instances and labels, and the learning objective is to be able to predict the label of a new unseen instance with as few errors as possible. In recent years, many proposed learning algorithms that perform fairly well have been proposed. The factors to accomplish successful model building depend on many aspects such as noise and size of data. Most often for learning algorithms, it is assumed that training data is represented by a vector of numerical data for which each measurement is a feature, and an important question related to machine learning is how to represent instances using vectors of these to yield high learning performance.Nowadays, data volumes are tremendously large in terms of aspects such as the number of features and most machine learning and data mining techniques may not be productive for high dimensional data, query accuracy and efficiency lessen swiftly as the dimension increases, the so-called curse of dimensionality. One of the requirements of good representation is conciseness since representation that uses too many features incurs major computational difficulties and may lead to poor prediction performance. Attribute selection is one of the significant methods in which theobjective is to choose a small subset to predict the target sufficiently well. Feature selection selects the most importance features, eliminates irrelevant and redundant features from the entire set of attributes, reduces the computational complexity of any learning and prediction algorithm used in the process, and reduces cost by excluding unselected features.A floating search is commonly used for the searching process. They are heuristic search methods which dynamically change the number of attributes included or eliminated at each step; they have produced very good results. The principal improvement of this thesis is focused on filter-based feature selection using genetic algorithm technique. Filters are normally less computationally intensive than wrapper method because wrappers apply a predictive model to score feature subsets. This approach is selected to be fast to reckon, whereas rooted to spot apprehending the goodness of the feature subsets. GA method can help to gain more diversity of population and provides us a way of reducing search space. Moreover, the contributions related to improves the contemporary sequential forward floating selection algorithm. In this thesis, an improving feature step using genetic algorithm is proposed as an additional step in a floating search. The objective is to eliminate weak features and replace a predominant one at each sequential step. From the research observations, the proposed method was discovered to be beneficial in selecting features that can boost the accuracy of data classification. Moreover, the experimental outcomes show that the proposed method with the genetic algorithm enhanced classification correctness and cut down data dimensionality for supervised learning problems.110 leavesapplication/pdfengThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.Supervised learningMachine learningGenetic algorithmFeature selection using genetic algorithmFeature selection using genetic algorithmtext--thesis--doctoral thesis10.14457/NIDA.the.2017.43