{"external_id":{"arxiv":["2310.16752"]},"day":"15","acknowledgement":"Moses Charikar was supported by a Simons Investigator award. Lunjia Hu was supported by Moses Charikar’s and Omer Reingold’s Simons Investigators awards, Omer Reingold’s NSF Award IIS-1908774, and the Simons Foundation Collaboration on the Theory of Algorithmic Fairness. Part of this work was done while Erik Waingarten was a postdoc at Stanford University, supported by an NSF postdoctoral fellowship and by Moses Charikar’s Simons\r\nInvestigator Award. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme\r\n(Grant agreement No. 101019564 “The Design of Modern Fully Dynamic Data\r\nStructures (MoDynStruct)” and the Austrian Science Fund (FWF) project Z 422-N, project “Static and Dynamic Hierarchical Graph Decompositions”, I 5982-N, and project “Fast Algorithms for a Reactive Network Layer (ReactNet)”, P 33775-N, with additional funding from the netidee SCIENCE Stiftung, 2020–2024.","year":"2023","has_accepted_license":"1","quality_controlled":"1","department":[{"_id":"MoHe"}],"type":"conference","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","language":[{"iso":"eng"}],"file":[{"success":1,"relation":"main_file","file_name":"2023_Neurips_Charikar.pdf","date_created":"2024-05-22T07:34:00Z","access_level":"open_access","content_type":"application/pdf","checksum":"d169a147a2adf55878e0a99e36a9d468","file_size":1445159,"creator":"dernst","file_id":"15416","date_updated":"2024-05-22T07:34:00Z"}],"abstract":[{"text":"Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and k-means++ can make Ω(ndk) time when clustering n points in a d-dimensional space (represented by an n×d matrix X) into k clusters. On massive datasets with moderate to large k, the multiplicative \r\nk factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time O(nnz(X)+nlogn) for arbitrary k. Here nnz(X) is the total number of non-zero entries in the input dataset X, which is upper bounded by nd and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio ˜O(k4) on any input dataset for the k-means objective, and our experiments show that the quality of the clusters found by our algorithm is usually much better than this worst-case bound. We use our algorithm for k-means clustering and for coreset construction; our experiments show that it gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks. Our theoretical analysis is based on novel results of independent interest. We show that the approximation ratio achieved after a random one-dimensional projection can be lifted to the original points and that k-means++ seeding can be implemented in expected time O(nlogn) in one dimension.","lang":"eng"}],"ddc":["000"],"volume":36,"alternative_title":["NeurIPS"],"date_published":"2023-12-15T00:00:00Z","conference":{"name":"NeurIPS: Neural Information Processing Systems","start_date":"2023-12-10","location":"New Orleans, LA, United States","end_date":"2023-12-16"},"citation":{"ista":"Charikar M, Hu L, Henzinger MH, Vötsch M, Waingarten E. 2023. Simple, scalable and effective clustering via one-dimensional projections. 37th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 36.","chicago":"Charikar, Moses, Lunjia Hu, Monika H Henzinger, Maximilian Vötsch, and Erik Waingarten. “Simple, Scalable and Effective Clustering via One-Dimensional Projections.” In 37th Conference on Neural Information Processing Systems, Vol. 36, 2023.","ama":"Charikar M, Hu L, Henzinger MH, Vötsch M, Waingarten E. Simple, scalable and effective clustering via one-dimensional projections. In: 37th Conference on Neural Information Processing Systems. Vol 36. ; 2023.","mla":"Charikar, Moses, et al. “Simple, Scalable and Effective Clustering via One-Dimensional Projections.” 37th Conference on Neural Information Processing Systems, vol. 36, 2023.","short":"M. Charikar, L. Hu, M.H. Henzinger, M. Vötsch, E. Waingarten, in:, 37th Conference on Neural Information Processing Systems, 2023.","apa":"Charikar, M., Hu, L., Henzinger, M. H., Vötsch, M., & Waingarten, E. (2023). Simple, scalable and effective clustering via one-dimensional projections. In 37th Conference on Neural Information Processing Systems (Vol. 36). New Orleans, LA, United States.","ieee":"M. Charikar, L. Hu, M. H. Henzinger, M. Vötsch, and E. Waingarten, “Simple, scalable and effective clustering via one-dimensional projections,” in 37th Conference on Neural Information Processing Systems, New Orleans, LA, United States, 2023, vol. 36."},"oa":1,"publication":"37th Conference on Neural Information Processing Systems","publication_status":"published","title":"Simple, scalable and effective clustering via one-dimensional projections","author":[{"first_name":"Moses","full_name":"Charikar, Moses","last_name":"Charikar"},{"first_name":"Lunjia","full_name":"Hu, Lunjia","last_name":"Hu"},{"full_name":"Henzinger, Monika H","id":"540c9bbd-f2de-11ec-812d-d04a5be85630","orcid":"0000-0002-5008-6530","first_name":"Monika H","last_name":"Henzinger"},{"last_name":"Vötsch","full_name":"Vötsch, Maximilian","first_name":"Maximilian"},{"last_name":"Waingarten","full_name":"Waingarten, Erik","first_name":"Erik"}],"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","short":"CC BY (4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","image":"/images/cc_by.png"},"oa_version":"Published Version","scopus_import":"1","license":"https://creativecommons.org/licenses/by/4.0/","article_processing_charge":"Yes","date_updated":"2024-05-22T07:38:48Z","project":[{"name":"The design and evaluation of modern fully dynamic data structures","grant_number":"101019564","call_identifier":"H2020","_id":"bd9ca328-d553-11ed-ba76-dc4f890cfe62"},{"_id":"34def286-11ca-11ed-8bc3-da5948e1613c","grant_number":"Z00422","name":"Wittgenstein Award - Monika Henzinger"},{"grant_number":"I05982","name":"Static and Dynamic Hierarchical Graph Decompositions","_id":"bda196b2-d553-11ed-ba76-8e8ee6c21103"},{"name":"Fast Algorithms for a Reactive Network Layer","grant_number":"P33775 ","_id":"bd9e3a2e-d553-11ed-ba76-8aa684ce17fe"}],"month":"12","intvolume":" 36","file_date_updated":"2024-05-22T07:34:00Z","publication_identifier":{"issn":["1049-5258"]},"_id":"15364","ec_funded":1,"date_created":"2024-05-05T22:01:05Z","status":"public"}