K Means Cluster Analysis Calculator

Calculator Form

Enter one point per line using x,y format. Extra numbers on a line are ignored after the first two values.

Dataset Points

Number of Clusters (k)

Maximum Iterations

Convergence Tolerance

Initialization Method

Random Seed

Normalize input data with z-score standardization

Example Data Table

Use this sample to test three compact groups in a two-dimensional space.

Point	X	Y	Suggested Group Pattern
P1	1.0	1.2	Lower-left cluster
P2	1.4	0.8	Lower-left cluster
P3	0.9	1.5	Lower-left cluster
P4	1.6	1.1	Lower-left cluster
P5	5.2	5.0	Center cluster
P6	4.8	5.4	Center cluster
P7	5.6	4.9	Center cluster
P8	5.1	5.8	Center cluster
P9	8.8	1.5	Right cluster
P10	9.1	1.0	Right cluster
P11	8.5	2.0	Right cluster
P12	9.4	1.7	Right cluster

Formula Used

K-means repeatedly assigns each point to the nearest centroid and then recomputes each centroid as the mean of its assigned points.

Assignment Rule: cluster(i) = arg min ||xᵢ - μⱼ||²

Centroid Update: μⱼ = (1 / nⱼ) × Σ xᵢ

Objective Score: SSE = Σ ||xᵢ - μcluster(i)||²

Optional Scaling: z = (x - mean) / standard deviation

Lower SSE usually means tighter clusters. A silhouette value closer to 1 suggests cleaner separation, while values near 0 indicate overlap.

How to Use This Calculator

Enter one two-dimensional point per line in x,y format.
Choose the number of clusters you want to detect.
Adjust iteration limit, tolerance, initialization, and seed.
Enable normalization when x and y scales differ greatly.
Submit the form to generate clusters, centroids, and quality metrics.
Review the summary cards, tables, and scatter plot.
Download CSV or PDF reports for records or sharing.

Frequently Asked Questions

1. What does k-means clustering do?

It groups similar points into k clusters by minimizing the distance between each point and its assigned centroid. The method is fast, practical, and widely used for segmentation, pattern discovery, and exploratory analysis.

2. How do I choose the right k value?

Start with a reasonable guess from domain knowledge. Then compare results using SSE, silhouette score, and visual separation. A useful k should create compact clusters without forcing clearly different groups together.

3. Why would I enable normalization?

Normalization helps when one variable has a much larger scale than the other. Without scaling, the larger feature can dominate distance calculations and distort assignments.

4. What is SSE in this calculator?

SSE is the sum of squared errors within clusters. It measures how tightly points sit around their centroids. Smaller values usually indicate more compact clusters.

5. What does the silhouette score mean?

The silhouette score compares a point’s fit inside its own cluster against nearby clusters. Higher positive values suggest cleaner boundaries. Values near zero often indicate overlap.

6. Why can different runs give different results?

K-means can start from different initial centroids, especially with random methods. Different starts may lead to different local solutions. Use a fixed seed for repeatable results.

7. Can I use this calculator for more than two features?

This page is designed for two-dimensional input so the graph stays clear and practical. For higher dimensions, the same logic applies, but visualization and input handling become more complex.

8. When is k-means not a good choice?

K-means is weaker when clusters are highly irregular, very unequal in size, or full of extreme outliers. In those cases, other clustering methods may describe the data better.