Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters
Issued Date
2012
Available Date
Copyright Date
Resource Type
Series
Edition
Language
eng
File Type
application/pdf
No. of Pages/File Size
ix, 81 leaves : ill ; 30 cm.
ISBN
ISSN
eISSN
Other identifier(s)
Identifier(s)
Access Rights
Access Status
Rights
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rights Holder(s)
Physical Location
National Institute of Development Administration. Library and Information Center
Bibliographic Citation
Citation
Pornpimol Bungkomkhun (2012). Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters. Retrieved from: http://repository.nida.ac.th/handle/662723737/277.
Title
Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters
Alternative Title(s)
Author(s)
Editor(s)
Advisor(s)
Advisor's email
Contributor(s)
Contributor(s)
Abstract
Clustering analysis is one of the primary methods of data mining tasks with the objective to understand the natural grouping (or structure) of data objects in a dataset. The clustering tasks aim to segment the entire data set into relatively homogenous subgroups or clusters where the similarities of the data objects within clusters are maximized and the similarities of data objects belonging to different clusters are minimized. For supervised clustering, not only attribute variables of data objects but also the class variable of data objects take part in grouping or dividing data objects into clusters in the manner that each cluster has high homogeneity in term of classes of its data objects. This dissertation proposes a grid-based supervised clustering algorithm that is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The algorithm not only needs no pre-specified number of clusters but also is insensitive to the order of the input data objects. The proposed algorithm gradually partitions data space into equal-size grid cells using one dimension at a time. The greedy method is used to arrange the order of dimensions for the gradual partitioning that would give the best quality of clustering, while the gradient descent method is used to find the optimal number of intervals for each partitioning. After all dimensions have been partitioned, any connected dense grid cells containing majority of data objects from the same class are merged into a cluster. By using the greedy and gradient descent methods as mentioned, the proposed algorithm can produce high quality clusters while reduce time to find the best partitioning and avoid the memory confinement problem during the process. On twodimensional synthetic datasets, the proposed algorithm can identify clusters with different shapes and sizes correctly. The proposed algorithm also outperforms other five supervised clustering algorithms when performed on some UCI datasets.
Table of contents
Description
Thesis (Ph.D. (Computer Science))--National Institute of Development Administration, 2012