---
res:
  bibo_abstract:
  - Word embeddings represent language vocabularies as clouds of d-dimensional points.
    We investigate how information is conveyed by the general shape of these clouds,
    instead of representing the semantic meaning of each token. Specifically, we use
    the notion of persistent homology from topological data analysis (TDA) to measure
    the distances between language pairs from the shape of their unlabeled embeddings.
    These distances quantify the degree of non-isometry of the embeddings. To distinguish
    whether these differences are random training errors or capture real information
    about the languages, we use the computed distance matrices to construct language
    phylogenetic trees over 81 Indo-European languages. Careful evaluation shows that
    our reconstructed trees exhibit strong and statistically-significant similarities
    to the reference.@eng
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Ondrej
      foaf_name: Draganov, Ondrej
      foaf_surname: Draganov
      foaf_workInfoHomepage: http://www.librecat.org/personId=2B23F01E-F248-11E8-B48F-1D18A9856A87
    orcid: 0000-0003-0464-3823
  - foaf_Person:
      foaf_givenName: Steven
      foaf_name: Skiena, Steven
      foaf_surname: Skiena
  bibo_doi: 10.18653/v1/2024.findings-emnlp.705
  dct_date: 2024^xs_gYear
  dct_language: eng
  dct_publisher: Association for Computational Linguistics@
  dct_title: 'The shape of word embeddings: Quantifying non-isometry with topological
    data analysis@'
...
