Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters
by Pornpimol Bungkomkhun
Title: | Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters |
Author(s): | Pornpimol Bungkomkhun |
Advisor: | Surapong Auwatanamongkol, advisor |
Degree name: | Doctor of Philosophy |
Degree level: | Doctoral |
Degree discipline: | Computer Science |
Degree department: | School of Applied Statistics |
Degree grantor: | National Institute of Development Administration |
Issued date: | 2012 |
Digital Object Identifier (DOI): | 10.14457/NIDA.the.2012.6 |
Publisher: | National Institute of Development Administration |
Abstract: |
Clustering analysis is one of the primary methods of data mining tasks with the objective to understand the natural grouping (or structure) of data objects in a dataset. The clustering tasks aim to segment the entire data set into relatively homogenous subgroups or clusters where the similarities of the data objects within clusters are maximized and the similarities of data objects belonging to different clusters are minimized. For supervised clustering, not only attribute variables of data objects but also the class variable of data objects take part in grouping or dividing data objects into clusters in the manner that each cluster has high homogeneity in term of classes of its data objects. This dissertation proposes a grid-based supervised clustering algorithm that is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The algorithm not only needs no pre-specified number of clusters but also is insensitive to the order of the input data objects. The proposed algorithm gradually partitions data space into equal-size grid cells using one dimension at a time. The greedy method is used to arrange the order of dimensions for the gradual partitioning that would give the best quality of clustering, while the gradient descent method is used to find the optimal number of intervals for each partitioning. After all dimensions have been partitioned, any connected dense grid cells containing majority of data objects from the same class are merged into a cluster. By using the greedy and gradient descent methods as mentioned, the proposed algorithm can produce high quality clusters while reduce time to find the best partitioning and avoid the memory confinement problem during the process. On twodimensional synthetic datasets, the proposed algorithm can identify clusters with different shapes and sizes correctly. The proposed algorithm also outperforms other five supervised clustering algorithms when performed on some UCI datasets. |
Description: |
Thesis (Ph.D. (Computer Science))--National Institute of Development Administration, 2012 |
Subject(s): | Cluster analysis
Algorithms |
Resource type: | Dissertation |
Extent: | ix, 81 leaves : ill ; 30 cm. |
Type: | Text |
File type: | application/pdf |
Language: | eng |
Rights: | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. |
URI: | http://repository.nida.ac.th/handle/662723737/277 |
Files in this item (CONTENT) |
|
View ทรัพยากรสารสนเทศทั้งหมดในคลังปัญญา ใช้เพื่อประโยชน์ทางการเรียนการสอนและการค้นคว้าเท่านั้น และต้องมีการอ้างอิงแหล่งที่มาทุกครั้งที่นำไปใช้ ห้ามดัดแปลงเนื้อหา และทำสำเนาต่อ รวมถึงไม่ให้อนุญาตนำไปใช้ประโยชน์เพื่อการค้า ไม่ว่ากรณีใด ๆ ทั้งสิ้น
|
This item appears in the following Collection(s) |
|
|