---
_id: '18214'
abstract:
- lang: eng
  text: "Graph sparsification is a technique that approximates a given graph by a
    sparse graph with a subset of vertices and/or edges. The goal of an effective
    sparsification algorithm is to maintain specific graph properties relevant to
    the downstream task while minimizing the graph's size. Graph algorithms often
    suffer from long execution time due to the irregularity and the large real-world
    graph size. Graph sparsification can be applied to greatly reduce the run time
    of graph algorithms by substituting the full graph with a much smaller sparsified
    graph, without significantly degrading the output quality. However, the interaction
    between numerous sparsifiers and graph properties is not widely explored, and
    the potential of graph sparsification is not fully understood.</jats:p>\r\n          <jats:p>In
    this work, we cover 16 widely-used graph metrics, 12 representative graph sparsification
    algorithms, and 14 real-world input graphs spanning various categories, exhibiting
    diverse characteristics, sizes, and densities. We developed a framework to extensively
    assess the performance of these sparsification algorithms against graph metrics,
    and provide insights to the results. Our study shows that there is no one sparsifier
    that performs the best in preserving all graph properties, e.g. sparsifiers that
    preserve distance-related graph properties (eccentricity) struggle to perform
    well on Graph Neural Networks (GNN). This paper presents a comprehensive experimental
    study evaluating the performance of sparsification algorithms in preserving essential
    graph metrics. The insights inform future research in incorporating matching graph
    sparsification to graph algorithms to maximize benefits while minimizing quality
    degradation. Furthermore, we provide a framework to facilitate the future evaluation
    of evolving sparsification algorithms, graph metrics, and ever-growing graph data."
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Yuhan
  full_name: Chen, Yuhan
  last_name: Chen
- first_name: Haojie
  full_name: Ye, Haojie
  last_name: Ye
- first_name: Sanketh
  full_name: Vedula, Sanketh
  last_name: Vedula
- first_name: Alexander
  full_name: Bronstein, Alexander
  id: 58f3726e-7cba-11ef-ad8b-e6e8cb3904e6
  last_name: Bronstein
  orcid: 0000-0001-9699-8730
- first_name: Ronald
  full_name: Dreslinski, Ronald
  last_name: Dreslinski
- first_name: Trevor
  full_name: Mudge, Trevor
  last_name: Mudge
- first_name: Nishil
  full_name: Talati, Nishil
  last_name: Talati
citation:
  ama: Chen Y, Ye H, Vedula S, et al. Demystifying graph sparsification algorithms
    in graph properties preservation. <i>Proceedings of the VLDB Endowment</i>. 2023;17(3):427-440.
    doi:<a href="https://doi.org/10.14778/3632093.3632106">10.14778/3632093.3632106</a>
  apa: Chen, Y., Ye, H., Vedula, S., Bronstein, A. M., Dreslinski, R., Mudge, T.,
    &#38; Talati, N. (2023). Demystifying graph sparsification algorithms in graph
    properties preservation. <i>Proceedings of the VLDB Endowment</i>. Association
    for Computing Machinery. <a href="https://doi.org/10.14778/3632093.3632106">https://doi.org/10.14778/3632093.3632106</a>
  chicago: Chen, Yuhan, Haojie Ye, Sanketh Vedula, Alex M. Bronstein, Ronald Dreslinski,
    Trevor Mudge, and Nishil Talati. “Demystifying Graph Sparsification Algorithms
    in Graph Properties Preservation.” <i>Proceedings of the VLDB Endowment</i>. Association
    for Computing Machinery, 2023. <a href="https://doi.org/10.14778/3632093.3632106">https://doi.org/10.14778/3632093.3632106</a>.
  ieee: Y. Chen <i>et al.</i>, “Demystifying graph sparsification algorithms in graph
    properties preservation,” <i>Proceedings of the VLDB Endowment</i>, vol. 17, no.
    3. Association for Computing Machinery, pp. 427–440, 2023.
  ista: Chen Y, Ye H, Vedula S, Bronstein AM, Dreslinski R, Mudge T, Talati N. 2023.
    Demystifying graph sparsification algorithms in graph properties preservation.
    Proceedings of the VLDB Endowment. 17(3), 427–440.
  mla: Chen, Yuhan, et al. “Demystifying Graph Sparsification Algorithms in Graph
    Properties Preservation.” <i>Proceedings of the VLDB Endowment</i>, vol. 17, no.
    3, Association for Computing Machinery, 2023, pp. 427–40, doi:<a href="https://doi.org/10.14778/3632093.3632106">10.14778/3632093.3632106</a>.
  short: Y. Chen, H. Ye, S. Vedula, A.M. Bronstein, R. Dreslinski, T. Mudge, N. Talati,
    Proceedings of the VLDB Endowment 17 (2023) 427–440.
date_created: 2024-10-08T12:48:57Z
date_published: 2023-11-01T00:00:00Z
date_updated: 2024-10-09T11:28:33Z
day: '01'
doi: 10.14778/3632093.3632106
extern: '1'
external_id:
  arxiv:
  - '2311.12314'
intvolume: '        17'
issue: '3'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2311.12314
month: '11'
oa: 1
oa_version: Preprint
page: 427-440
publication: Proceedings of the VLDB Endowment
publication_identifier:
  issn:
  - 2150-8097
publication_status: published
publisher: Association for Computing Machinery
quality_controlled: '1'
scopus_import: '1'
status: public
title: Demystifying graph sparsification algorithms in graph properties preservation
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 17
year: '2023'
...
---
_id: '11878'
abstract:
- lang: eng
  text: "Given only the URL of a web page, can we identify its language? This is the
    question that we examine in this paper.\r\nSuch a language classifier is, for
    example, useful for crawlers of web search engines, which frequently try to satisfy
    certain language quotas. To determine the language of uncrawled web pages, they
    have to download the page, which might be wasteful, if the page is not in the
    desired language. With URL-based language classifiers these redundant downloads
    can be avoided.\r\n\r\nWe apply a variety of machine learning algorithms to the
    language identification task and evaluate their performance in extensive experiments
    for five languages: English, French, German, Spanish and Italian. Our best methods
    achieve an F-measure, averaged over all languages, of around .90 for both a random
    sample of 1,260 web page from a large web crawl and for 25k pages from the ODP
    directory. For 5k pages of web search engine results we even achieve an F-measure
    of .96. The achieved recall for these collections is .93, .88 and .95 respectively.
    Two independent human evaluators performed considerably worse on the task, with
    an F-measure of .75 and a typical recall of a mere .67. Using only country-code
    top-level domains, such as .de or .fr yields a good precision, but a typical recall
    of below .60 and an F-measure of around .68."
article_processing_charge: No
article_type: original
author:
- first_name: Eda
  full_name: Baykan, Eda
  last_name: Baykan
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: Ingmar
  full_name: Weber, Ingmar
  last_name: Weber
citation:
  ama: Baykan E, Henzinger M, Weber I. Web page language identification based on URLs.
    <i>Proceedings of the VLDB Endowment</i>. 2008;1(1):176-187. doi:<a href="https://doi.org/10.14778/1453856.1453880">10.14778/1453856.1453880</a>
  apa: Baykan, E., Henzinger, M., &#38; Weber, I. (2008). Web page language identification
    based on URLs. <i>Proceedings of the VLDB Endowment</i>. Association for Computing
    Machinery. <a href="https://doi.org/10.14778/1453856.1453880">https://doi.org/10.14778/1453856.1453880</a>
  chicago: Baykan, Eda, Monika Henzinger, and Ingmar Weber. “Web Page Language Identification
    Based on URLs.” <i>Proceedings of the VLDB Endowment</i>. Association for Computing
    Machinery, 2008. <a href="https://doi.org/10.14778/1453856.1453880">https://doi.org/10.14778/1453856.1453880</a>.
  ieee: E. Baykan, M. Henzinger, and I. Weber, “Web page language identification based
    on URLs,” <i>Proceedings of the VLDB Endowment</i>, vol. 1, no. 1. Association
    for Computing Machinery, pp. 176–187, 2008.
  ista: Baykan E, Henzinger M, Weber I. 2008. Web page language identification based
    on URLs. Proceedings of the VLDB Endowment. 1(1), 176–187.
  mla: Baykan, Eda, et al. “Web Page Language Identification Based on URLs.” <i>Proceedings
    of the VLDB Endowment</i>, vol. 1, no. 1, Association for Computing Machinery,
    2008, pp. 176–87, doi:<a href="https://doi.org/10.14778/1453856.1453880">10.14778/1453856.1453880</a>.
  short: E. Baykan, M. Henzinger, I. Weber, Proceedings of the VLDB Endowment 1 (2008)
    176–187.
date_created: 2022-08-16T13:10:11Z
date_published: 2008-08-01T00:00:00Z
date_updated: 2024-11-06T12:21:34Z
day: '01'
doi: 10.14778/1453856.1453880
extern: '1'
intvolume: '         1'
issue: '1'
language:
- iso: eng
month: '08'
oa_version: None
page: 176-187
publication: Proceedings of the VLDB Endowment
publication_identifier:
  issn:
  - 2150-8097
publication_status: published
publisher: Association for Computing Machinery
quality_controlled: '1'
scopus_import: '1'
status: public
title: Web page language identification based on URLs
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 1
year: '2008'
...
