Towards understanding the word sensitivity of attention layers: A study via random features

Bombari, Simone; Mondelli, Marco

Towards understanding the word sensitivity of attention layers: A study via random features

Bombari S, Mondelli M. 2024. Towards understanding the word sensitivity of attention layers: A study via random features. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 4300–4328.

Download (ext.)

https://doi.org/10.48550/arXiv.2402.02969 [Preprint]

Conference Paper | Published | English

Scopus indexed

Author

Bombari, Simone^ISTA; Mondelli, Marco^ISTA

Corresponding author has ISTA affiliation

Department

Mondelli Group

Grant

Prix Lopez-Loretta 2019 - Marco Mondelli

Series Title

PMLR

Abstract

Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depends on one or few words, even if the sentence is long. Our work studies this key property, dubbed word sensitivity (WS), in the prototypical setting of random features. We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map. The argument critically exploits the role of the softmax in the attention layer, highlighting its benefit compared to other activations (e.g., ReLU). In contrast, the WS of standard random features is of order 1/n−−√, n being the number of words in the textual sample, and thus it decays with the length of the context. We then translate these results on the word sensitivity into generalization bounds: due to their low WS, random features provably cannot learn to distinguish between two sentences that differ only in a single word; in contrast, due to their high WS, random attention features have higher generalization capabilities. We validate our theoretical results with experimental evidence over the BERT-Base word embeddings of the imdb review dataset.

Publishing Year

2024

Date Published

2024-07-30

Proceedings Title

41st International Conference on Machine Learning

Publisher

ML Research Press

Acknowledgement

The authors were partially supported by the 2019 LopezLoreta prize, and they would like to thank Mohammad Hossein Amani, Lorenzo Beretta, and Clement Rebuffel for helpful discussions.

Volume

235

Page

4300-4328

Conference

ICML: International Conference on Machine Learning

Conference Location

Vienna, Austria

Conference Date

2024-07-21 – 2024-07-27

eISSN

2640-3498

IST-REx-ID

18973

Cite this

Bombari S, Mondelli M. Towards understanding the word sensitivity of attention layers: A study via random features. In: 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:4300-4328.

Bombari, S., & Mondelli, M. (2024). Towards understanding the word sensitivity of attention layers: A study via random features. In 41st International Conference on Machine Learning (Vol. 235, pp. 4300–4328). Vienna, Austria: ML Research Press.

Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features.” In 41st International Conference on Machine Learning, 235:4300–4328. ML Research Press, 2024.

S. Bombari and M. Mondelli, “Towards understanding the word sensitivity of attention layers: A study via random features,” in 41st International Conference on Machine Learning, Vienna, Austria, 2024, vol. 235, pp. 4300–4328.

Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features.” 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 4300–28.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

https://doi.org/10.48550/arXiv.2402.02969

Access Level

Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2402.02969

Search this title in

Google Scholar