Jo May 22, 2023

Clustering analysis is being applied to many fields of engineering and social science. One of the most important problems in clustering is to determine the number of clusters properly. The K-means clustering algorithm is widely used due to its simplicity. However, application of this algorithm is restricted by the fact that K must be chosen at the start. Thus, correct selection of K is essential, especially in this algorithm.

Many approaches to infer K and estimate the significance of the clustering result have been proposed. In those approaches, the first thing to be done beforehand is determination of the range of K. And, clustering is performed for every possible value of K, and then the value of the cluster number is determined. This is at a disadvantage of high complexity in the case of a great number of data points or high dimension. This would seem to become more serious when clustering big datasets is required and when it comes to density functions.

Ri Yong Ae, a lecturer at the Faculty of Applied Mathematics, has proposed a new method to determine the cluster number without clustering for every K in K-means.

First, introducing a new statistics RVR (ratio of variance to range), she proposed an algorithm to determine the cluster number K and perform clustering.

Then, to evaluate the effectiveness, she performed a simulation test with different types of datasets.

She has concluded that the proposed method has observed a significant improvement in speed and quality of determination of the cluster number and clustering, and the proposed algorithm would be used efficiently and widely for clustering of multi-dimensional data.

If further information is needed, please refer to her paper “A New Method to Determine Cluster Number without Clustering for Every K based on Ratio of Variance to Range (RVR) in K-Means” in “Mathematical Problems in Engineering” (SCI).