Simple, scalable and effective clustering via one-dimensional projections

Charikar M, Hu L, Henzinger MH, Vötsch M, Waingarten E. 2023. Simple, scalable and effective clustering via one-dimensional projections. 37th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 36.

Download
OA 2023_Neurips_Charikar.pdf 1.45 MB [Published Version]
Conference Paper | Published | English

Scopus indexed
Author
Charikar, Moses; Hu, Lunjia; Henzinger, MonikaISTA ; Vötsch, Maximilian; Waingarten, Erik
Series Title
NeurIPS
Abstract
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and k-means++ can make Ω(ndk) time when clustering n points in a d-dimensional space (represented by an n×d matrix X) into k clusters. On massive datasets with moderate to large k, the multiplicative k factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time O(nnz(X)+nlogn) for arbitrary k. Here nnz(X) is the total number of non-zero entries in the input dataset X, which is upper bounded by nd and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio ˜O(k4) on any input dataset for the k-means objective, and our experiments show that the quality of the clusters found by our algorithm is usually much better than this worst-case bound. We use our algorithm for k-means clustering and for coreset construction; our experiments show that it gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks. Our theoretical analysis is based on novel results of independent interest. We show that the approximation ratio achieved after a random one-dimensional projection can be lifted to the original points and that k-means++ seeding can be implemented in expected time O(nlogn) in one dimension.
Publishing Year
Date Published
2023-12-15
Proceedings Title
37th Conference on Neural Information Processing Systems
Acknowledgement
Moses Charikar was supported by a Simons Investigator award. Lunjia Hu was supported by Moses Charikar’s and Omer Reingold’s Simons Investigators awards, Omer Reingold’s NSF Award IIS-1908774, and the Simons Foundation Collaboration on the Theory of Algorithmic Fairness. Part of this work was done while Erik Waingarten was a postdoc at Stanford University, supported by an NSF postdoctoral fellowship and by Moses Charikar’s Simons Investigator Award. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 101019564 “The Design of Modern Fully Dynamic Data Structures (MoDynStruct)” and the Austrian Science Fund (FWF) project Z 422-N, project “Static and Dynamic Hierarchical Graph Decompositions”, I 5982-N, and project “Fast Algorithms for a Reactive Network Layer (ReactNet)”, P 33775-N, with additional funding from the netidee SCIENCE Stiftung, 2020–2024.
Volume
36
Conference
NeurIPS: Neural Information Processing Systems
Conference Location
New Orleans, LA, United States
Conference Date
2023-12-10 – 2023-12-16
ISSN
IST-REx-ID

Cite this

Charikar M, Hu L, Henzinger MH, Vötsch M, Waingarten E. Simple, scalable and effective clustering via one-dimensional projections. In: 37th Conference on Neural Information Processing Systems. Vol 36. ; 2023.
Charikar, M., Hu, L., Henzinger, M. H., Vötsch, M., & Waingarten, E. (2023). Simple, scalable and effective clustering via one-dimensional projections. In 37th Conference on Neural Information Processing Systems (Vol. 36). New Orleans, LA, United States.
Charikar, Moses, Lunjia Hu, Monika H Henzinger, Maximilian Vötsch, and Erik Waingarten. “Simple, Scalable and Effective Clustering via One-Dimensional Projections.” In 37th Conference on Neural Information Processing Systems, Vol. 36, 2023.
M. Charikar, L. Hu, M. H. Henzinger, M. Vötsch, and E. Waingarten, “Simple, scalable and effective clustering via one-dimensional projections,” in 37th Conference on Neural Information Processing Systems, New Orleans, LA, United States, 2023, vol. 36.
Charikar M, Hu L, Henzinger MH, Vötsch M, Waingarten E. 2023. Simple, scalable and effective clustering via one-dimensional projections. 37th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 36.
Charikar, Moses, et al. “Simple, Scalable and Effective Clustering via One-Dimensional Projections.” 37th Conference on Neural Information Processing Systems, vol. 36, 2023.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2024-05-22
MD5 Checksum
d169a147a2adf55878e0a99e36a9d468


Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2310.16752

Search this title in

Google Scholar