K-means clustering is useful when you are trying to separate your dataset into groups that are not explicitly obvious. This process can be used to validate business assumptions about the different types of groups that exist, or it can be used to identify unknown groups in complex datasets. K-means clustering is one of the simplest unsupervised learning algorithms used to solve clustering problems.
Note: Grouping results may not be identical each time analysis is run due to the use of learning algorithms.
Run K-Means Clustering
- Drag variables of interest into the Selected Variable(s) box (A)
- Specify how many clusters to divide the sample into (B)
- Select the maximum amount of iterations for the algorithm to run (C)
- Create a cluster and/or distance variable based on this analysis by clicking on the Import Results tab (D)
- Output will display in the right panel (E)
In your output, sizes refer to the number of respondents that fit into each cluster (in order). A table of cluster means for each variable will be generated as well as the sum of squares for each cluster. Cluster distance refers to the distance from the closest cluster centroid. If you chose to import your cluster and distance variables they will be created and displayed on the variables list page.