---
_id: '11770'
abstract:
- lang: eng
  text: 'We compare several algorithms for identifying mirrored hosts on the World
    Wide Web. The algorithms operate on the basis of URL strings and linkage data:
    the type of information about Web pages easily available from Web proxies and
    crawlers. Identification of mirrored hosts can improve Web-based information retrieval
    in several ways: first, by identifying mirrored hosts, search engines can avoid
    storing and returning duplicate documents. Second, several new information retrieval
    techniques for the Web make inferences based on the explicit links among hypertext
    documents—mirroring perturbs their graph model and degrades performance. Third,
    mirroring information can be used to redirect users to alternate mirror sites
    to compensate for various failures, and can thus improve the performance of Web
    browsers and proxies. We evaluated four classes of “top-down” algorithms for detecting
    mirrored host pairs (that is, algorithms that are based on page attributes such
    as URL, IP address, and hyperlinks between pages, and not on the page content)
    on a collection of 140 million URLs (on 230,000 hosts) and their associated connectivity
    information. Our best approach is one which combines five algorithms and achieved
    a precision of 0.57 for a recall of 0.86 considering 100,000 ranked host pairs.'
article_processing_charge: No
article_type: original
author:
- first_name: Krishna
  full_name: Bharat, Krishna
  last_name: Bharat
- first_name: Andrei
  full_name: Broder, Andrei
  last_name: Broder
- first_name: Jeffrey
  full_name: Dean, Jeffrey
  last_name: Dean
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
citation:
  ama: Bharat K, Broder A, Dean J, Henzinger M. A comparison of techniques to find
    mirrored hosts on the WWW. <i>Journal of the American Society for Information
    Science</i>. 2000;51(12):1114-1122. doi:<a href="https://doi.org/10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0">10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0</a>
  apa: Bharat, K., Broder, A., Dean, J., &#38; Henzinger, M. (2000). A comparison
    of techniques to find mirrored hosts on the WWW. <i>Journal of the American Society
    for Information Science</i>. Wiley. <a href="https://doi.org/10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0">https://doi.org/10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0</a>
  chicago: Bharat, Krishna, Andrei Broder, Jeffrey Dean, and Monika Henzinger. “A
    Comparison of Techniques to Find Mirrored Hosts on the WWW.” <i>Journal of the
    American Society for Information Science</i>. Wiley, 2000. <a href="https://doi.org/10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0">https://doi.org/10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0</a>.
  ieee: K. Bharat, A. Broder, J. Dean, and M. Henzinger, “A comparison of techniques
    to find mirrored hosts on the WWW,” <i>Journal of the American Society for Information
    Science</i>, vol. 51, no. 12. Wiley, pp. 1114–1122, 2000.
  ista: Bharat K, Broder A, Dean J, Henzinger M. 2000. A comparison of techniques
    to find mirrored hosts on the WWW. Journal of the American Society for Information
    Science. 51(12), 1114–1122.
  mla: Bharat, Krishna, et al. “A Comparison of Techniques to Find Mirrored Hosts
    on the WWW.” <i>Journal of the American Society for Information Science</i>, vol.
    51, no. 12, Wiley, 2000, pp. 1114–22, doi:<a href="https://doi.org/10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0">10.1002/1097-4571(2000)9999:9999&#60;::aid-asi1025&#62;3.0.co;2-0</a>.
  short: K. Bharat, A. Broder, J. Dean, M. Henzinger, Journal of the American Society
    for Information Science 51 (2000) 1114–1122.
date_created: 2022-08-08T12:57:37Z
date_published: 2000-10-01T00:00:00Z
date_updated: 2024-11-06T08:14:37Z
day: '01'
doi: 10.1002/1097-4571(2000)9999:9999<::aid-asi1025>3.0.co;2-0
extern: '1'
intvolume: '        51'
issue: '12'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.1002/1097-4571(2000)9999:9999<::aid-asi1025>3.0.co;2-0
month: '10'
oa: 1
oa_version: Published Version
page: 1114-1122
publication: Journal of the American Society for Information Science
publication_identifier:
  issn:
  - 0002-8231
  - 1097-4571
publication_status: published
publisher: Wiley
quality_controlled: '1'
scopus_import: '1'
status: public
title: A comparison of techniques to find mirrored hosts on the WWW
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 51
year: '2000'
...
