Feature selection using genetic algorithm
Files
Issued Date
2017
Available Date
Copyright Date
Resource Type
Series
Edition
Language
eng
File Type
application/pdf
No. of Pages/File Size
110 leaves
ISBN
ISSN
eISSN
Other identifier(s)
b201179
Identifier(s)
Access Rights
Access Status
Rights
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rights Holder(s)
Physical Location
National Institute of Development Administration. Library and Information Center
Bibliographic Citation
Citation
Kanyanut Homsapaya (2017). Feature selection using genetic algorithm. Retrieved from: https://repository.nida.ac.th/handle/662723737/5874.
Title
Feature selection using genetic algorithm
Alternative Title(s)
Feature selection using genetic algorithm
Author(s)
Editor(s)
Advisor(s)
Advisor's email
Contributor(s)
Contributor(s)
Abstract
In this dissertation, a method of feature selection in machine learning, and
more particularly supervised learning is presented. Supervised learning is a machine
learning task that infers answers from a training data set. In machine learning, training
datasets are employed in order to create a model which enables reasonable
predictions, while in supervised learning, each training example is a training set
consisting of instances and labels, and the learning objective is to be able to predict
the label of a new unseen instance with as few errors as possible. In recent years,
many proposed learning algorithms that perform fairly well have been proposed. The
factors to accomplish successful model building depend on many aspects such as
noise and size of data. Most often for learning algorithms, it is assumed that training
data is represented by a vector of numerical data for which each measurement is a
feature, and an important question related to machine learning is how to represent
instances using vectors of these to yield high learning performance.
Nowadays, data volumes are tremendously large in terms of aspects such as the number of features and most machine learning and data mining techniques may not be productive for high dimensional data, query accuracy and efficiency lessen swiftly as the dimension increases, the so-called curse of dimensionality. One of the requirements of good representation is conciseness since representation that uses too many features incurs major computational difficulties and may lead to poor prediction performance. Attribute selection is one of the significant methods in which the
objective is to choose a small subset to predict the target sufficiently well. Feature selection selects the most importance features, eliminates irrelevant and redundant features from the entire set of attributes, reduces the computational complexity of any learning and prediction algorithm used in the process, and reduces cost by excluding unselected features.
A floating search is commonly used for the searching process. They are heuristic search methods which dynamically change the number of attributes included or eliminated at each step; they have produced very good results. The principal improvement of this thesis is focused on filter-based feature selection using genetic algorithm technique. Filters are normally less computationally intensive than wrapper method because wrappers apply a predictive model to score feature subsets. This approach is selected to be fast to reckon, whereas rooted to spot apprehending the goodness of the feature subsets. GA method can help to gain more diversity of population and provides us a way of reducing search space. Moreover, the contributions related to improves the contemporary sequential forward floating selection algorithm. In this thesis, an improving feature step using genetic algorithm is proposed as an additional step in a floating search. The objective is to eliminate weak features and replace a predominant one at each sequential step. From the research observations, the proposed method was discovered to be beneficial in selecting features that can boost the accuracy of data classification. Moreover, the experimental outcomes show that the proposed method with the genetic algorithm enhanced classification correctness and cut down data dimensionality for supervised learning problems.
Nowadays, data volumes are tremendously large in terms of aspects such as the number of features and most machine learning and data mining techniques may not be productive for high dimensional data, query accuracy and efficiency lessen swiftly as the dimension increases, the so-called curse of dimensionality. One of the requirements of good representation is conciseness since representation that uses too many features incurs major computational difficulties and may lead to poor prediction performance. Attribute selection is one of the significant methods in which the
objective is to choose a small subset to predict the target sufficiently well. Feature selection selects the most importance features, eliminates irrelevant and redundant features from the entire set of attributes, reduces the computational complexity of any learning and prediction algorithm used in the process, and reduces cost by excluding unselected features.
A floating search is commonly used for the searching process. They are heuristic search methods which dynamically change the number of attributes included or eliminated at each step; they have produced very good results. The principal improvement of this thesis is focused on filter-based feature selection using genetic algorithm technique. Filters are normally less computationally intensive than wrapper method because wrappers apply a predictive model to score feature subsets. This approach is selected to be fast to reckon, whereas rooted to spot apprehending the goodness of the feature subsets. GA method can help to gain more diversity of population and provides us a way of reducing search space. Moreover, the contributions related to improves the contemporary sequential forward floating selection algorithm. In this thesis, an improving feature step using genetic algorithm is proposed as an additional step in a floating search. The objective is to eliminate weak features and replace a predominant one at each sequential step. From the research observations, the proposed method was discovered to be beneficial in selecting features that can boost the accuracy of data classification. Moreover, the experimental outcomes show that the proposed method with the genetic algorithm enhanced classification correctness and cut down data dimensionality for supervised learning problems.