Calculator Form
Enter one point per line using x,y format. Extra numbers on a line are ignored after the first two values.
Example Data Table
Use this sample to test three compact groups in a two-dimensional space.
| Point | X | Y | Suggested Group Pattern |
|---|---|---|---|
| P1 | 1.0 | 1.2 | Lower-left cluster |
| P2 | 1.4 | 0.8 | Lower-left cluster |
| P3 | 0.9 | 1.5 | Lower-left cluster |
| P4 | 1.6 | 1.1 | Lower-left cluster |
| P5 | 5.2 | 5.0 | Center cluster |
| P6 | 4.8 | 5.4 | Center cluster |
| P7 | 5.6 | 4.9 | Center cluster |
| P8 | 5.1 | 5.8 | Center cluster |
| P9 | 8.8 | 1.5 | Right cluster |
| P10 | 9.1 | 1.0 | Right cluster |
| P11 | 8.5 | 2.0 | Right cluster |
| P12 | 9.4 | 1.7 | Right cluster |
Formula Used
K-means repeatedly assigns each point to the nearest centroid and then recomputes each centroid as the mean of its assigned points.
Lower SSE usually means tighter clusters. A silhouette value closer to 1 suggests cleaner separation, while values near 0 indicate overlap.
How to Use This Calculator
- Enter one two-dimensional point per line in x,y format.
- Choose the number of clusters you want to detect.
- Adjust iteration limit, tolerance, initialization, and seed.
- Enable normalization when x and y scales differ greatly.
- Submit the form to generate clusters, centroids, and quality metrics.
- Review the summary cards, tables, and scatter plot.
- Download CSV or PDF reports for records or sharing.
Frequently Asked Questions
1. What does k-means clustering do?
It groups similar points into k clusters by minimizing the distance between each point and its assigned centroid. The method is fast, practical, and widely used for segmentation, pattern discovery, and exploratory analysis.
2. How do I choose the right k value?
Start with a reasonable guess from domain knowledge. Then compare results using SSE, silhouette score, and visual separation. A useful k should create compact clusters without forcing clearly different groups together.
3. Why would I enable normalization?
Normalization helps when one variable has a much larger scale than the other. Without scaling, the larger feature can dominate distance calculations and distort assignments.
4. What is SSE in this calculator?
SSE is the sum of squared errors within clusters. It measures how tightly points sit around their centroids. Smaller values usually indicate more compact clusters.
5. What does the silhouette score mean?
The silhouette score compares a point’s fit inside its own cluster against nearby clusters. Higher positive values suggest cleaner boundaries. Values near zero often indicate overlap.
6. Why can different runs give different results?
K-means can start from different initial centroids, especially with random methods. Different starts may lead to different local solutions. Use a fixed seed for repeatable results.
7. Can I use this calculator for more than two features?
This page is designed for two-dimensional input so the graph stays clear and practical. For higher dimensions, the same logic applies, but visualization and input handling become more complex.
8. When is k-means not a good choice?
K-means is weaker when clusters are highly irregular, very unequal in size, or full of extreme outliers. In those cases, other clustering methods may describe the data better.