Grid-based supervised clustering algorithm using greedy and gradient descent methods to build clusters

dc.contributor.advisorSurapong Auwatanamongkol, advisorth
dc.contributor.authorPornpimol Bungkomkhunth
dc.descriptionThesis (Ph.D. (Computer Science))--National Institute of Development Administration, 2012th
dc.description.abstractClustering analysis is one of the primary methods of data mining tasks with the objective to understand the natural grouping (or structure) of data objects in a dataset. The clustering tasks aim to segment the entire data set into relatively homogenous subgroups or clusters where the similarities of the data objects within clusters are maximized and the similarities of data objects belonging to different clusters are minimized. For supervised clustering, not only attribute variables of data objects but also the class variable of data objects take part in grouping or dividing data objects into clusters in the manner that each cluster has high homogeneity in term of classes of its data objects. This dissertation proposes a grid-based supervised clustering algorithm that is able to identify clusters of any shapes and sizes without presuming any canonical form for data distribution. The algorithm not only needs no pre-specified number of clusters but also is insensitive to the order of the input data objects. The proposed algorithm gradually partitions data space into equal-size grid cells using one dimension at a time. The greedy method is used to arrange the order of dimensions for the gradual partitioning that would give the best quality of clustering, while the gradient descent method is used to find the optimal number of intervals for each partitioning. After all dimensions have been partitioned, any connected dense grid cells containing majority of data objects from the same class are merged into a cluster. By using the greedy and gradient descent methods as mentioned, the proposed algorithm can produce high quality clusters while reduce time to find the best partitioning and avoid the memory confinement problem during the process. On twodimensional synthetic datasets, the proposed algorithm can identify clusters with different shapes and sizes correctly. The proposed algorithm also outperforms other five supervised clustering algorithms when performed on some UCI
dc.format.extentix, 81 leaves : ill ; 30
dc.publisherNational Institute of Development Administrationth
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
dc.subject.lccQA 278 P826 2012th
dc.subject.otherCluster analysisth
dc.titleGrid-based supervised clustering algorithm using greedy and gradient descent methods to build clustersth
dc.typetext--thesis--doctoral thesisth
mods.physicalLocationNational Institute of Development Administration. Library and Information Centerth of Applied Statisticsth Scienceth Institute of Development Administrationth of Philosophyth
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
8.52 MB
Adobe Portable Document Format
Full Text