Deterministic clustering in high dimensional spaces: Sketches and approximation

Cohen-Addad V, Saulpic D, Schwiegelshohn C. 2023. Deterministic clustering in high dimensional spaces: Sketches and approximation. 2023 IEEE 64th Annual Symposium on Foundations of Computer Science. FOCS: Symposium on Foundations of Computer Science, 1105–1130.

Download (ext.)

Conference Paper | Published | English

Scopus indexed
Author
Cohen-Addad, Vincent; Saulpic, DavidISTA; Schwiegelshohn, Chris
Abstract
In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic k-median and k-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension d, the precision parameter $\varepsilon^{-1}$ or k. Furthermore, there is no coreset construction that succeeds with probability $1-1/n$ and whose size does not depend on the number of input points, n. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman WIREs Data Mining Knowl. Discov’20].Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are $1+\sqrt{2}$ for k-median [Cohen-Addad, Esfandiari, Mirrokni, Narayanan, STOC’22] and 6.12903 for k-means [Grandoni, Ostrovsky, Rabani, Schulman, Venkat, Inf. Process. Lett.’22]. Those are the best results, even when allowing a complexity FPT in the number of clusters k: this stands in sharp contrast with the $(1+\varepsilon)$-approximation achievable in that case, when allowing randomization.In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We show how to compute a dimension reduction onto $\varepsilon^{-O(1)} \log k$ dimensions in time $k^{O\left(\varepsilon^{-O(1)}+\log \log k\right)}$ poly $(n d)$, and how to build a coreset of size $O\left(k^{2} \log ^{3} k \varepsilon^{-O(1)}\right)$ in time $2^{\varepsilon^{O(1)} k \log ^{3} k}+k^{O\left(\varepsilon^{-O(1)}+\log \log k\right)}$ poly $(n d)$. In the case where k is small, this answers an open question of [Feldman WIDM’20] and [Munteanu and Schwiegelshohn, Künstliche Intell. ’18] on whether it is possible to efficiently compute coresets deterministically.We also construct a deterministic algorithm for computing $(1+$ $\varepsilon)$-approximation to k-median and k-means in high dimensional Euclidean spaces in time $2^{k^{2} \log ^{3} k / \varepsilon^{O(1)}}$ poly $(n d)$, close to the best randomized complexity of $2^{(k / \varepsilon)^{O(1)}}$ nd (see [Kumar, Sabharwal, Sen, JACM 10] and [Bhattacharya, Jaiswal, Kumar, TCS’18]).Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS ’22] by a factor k.
Publishing Year
Date Published
2023-12-22
Proceedings Title
2023 IEEE 64th Annual Symposium on Foundations of Computer Science
Publisher
IEEE
Acknowledgement
D. Sauplic has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 101034413, and Grant agreement No. 101019564 “The Design of Modern Fully Dynamic Data Structures (MoDynStruct)”. C. Schwiegelshohn acknowledges the support of the Independent Research Fund Denmark (DFF) under a Sapere Aude Research Leader grant No 1051-00106B.
Page
1105-1130
Conference
FOCS: Symposium on Foundations of Computer Science
Conference Location
Santa Cruz, CA, United States
Conference Date
2023-11-06 – 2023-11-09
IST-REx-ID

Cite this

Cohen-Addad V, Saulpic D, Schwiegelshohn C. Deterministic clustering in high dimensional spaces: Sketches and approximation. In: 2023 IEEE 64th Annual Symposium on Foundations of Computer Science. IEEE; 2023:1105-1130. doi:10.1109/focs57990.2023.00066
Cohen-Addad, V., Saulpic, D., & Schwiegelshohn, C. (2023). Deterministic clustering in high dimensional spaces: Sketches and approximation. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (pp. 1105–1130). Santa Cruz, CA, United States: IEEE. https://doi.org/10.1109/focs57990.2023.00066
Cohen-Addad, Vincent, David Saulpic, and Chris Schwiegelshohn. “Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation.” In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, 1105–30. IEEE, 2023. https://doi.org/10.1109/focs57990.2023.00066.
V. Cohen-Addad, D. Saulpic, and C. Schwiegelshohn, “Deterministic clustering in high dimensional spaces: Sketches and approximation,” in 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, Santa Cruz, CA, United States, 2023, pp. 1105–1130.
Cohen-Addad V, Saulpic D, Schwiegelshohn C. 2023. Deterministic clustering in high dimensional spaces: Sketches and approximation. 2023 IEEE 64th Annual Symposium on Foundations of Computer Science. FOCS: Symposium on Foundations of Computer Science, 1105–1130.
Cohen-Addad, Vincent, et al. “Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation.” 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, IEEE, 2023, pp. 1105–30, doi:10.1109/focs57990.2023.00066.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2310.04076

Search this title in

Google Scholar