Can LLMs separate instructions from data? And what do we even mean by that?
Zverev E, Abdelnabi S, Tabesh S, Fritz M, Lampert C. 2024. Can LLMs separate instructions from data? And what do we even mean by that? arXiv, 2403.06833.
Download
Download (ext.)
Preprint
| Published
| English
Author
Corresponding author has ISTA affiliation
Department
Abstract
Instruction-tuned Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features that are common in other areas of computer science, particularly an explicit separation of instructions and data. This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks. Surprisingly, there is currently no established definition or benchmark to quantify this phenomenon. In this work, we close this gap by introducing a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs. We also present a new dataset, SEP, that allows estimating the measure for real-world models. Our results on various LLMs show that the problem of instruction-data separation is real: all models fail to achieve high separation, and canonical mitigation techniques, such as prompt engineering and fine-tuning, either fail to substantially improve separation or reduce model utility. The source code and SEP dataset are openly accessible at https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed.
Publishing Year
Date Published
2024-03-01
Journal Title
arXiv
Acknowledgement
The authors would like to sincerely thank Juan Rocamonde for valuable feedback to our manuscript. We acknowledge the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp). We thank Dan Alistarh for providing us with computational resources. This work was partially funded by the German Federal Ministry of Education and Research (BMBF) under the grant AIgenCY (16KIS2012) and ELSA – European Lighthouse on Secure and Safe AI funded by the European Union under grant agreement No. 101070617. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the European Commission can be held responsible for them.
Acknowledged SSUs
Article Number
2403.06833
IST-REx-ID
Cite this
Zverev E, Abdelnabi S, Tabesh S, Fritz M, Lampert C. Can LLMs separate instructions from data? And what do we even mean by that? arXiv. 2024. doi:10.48550/arXiv.2403.06833
Zverev, E., Abdelnabi, S., Tabesh, S., Fritz, M., & Lampert, C. (2024). Can LLMs separate instructions from data? And what do we even mean by that? arXiv. https://doi.org/10.48550/arXiv.2403.06833
Zverev, Egor, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, and Christoph Lampert. “Can LLMs Separate Instructions from Data? And What Do We Even Mean by That?” ArXiv, 2024. https://doi.org/10.48550/arXiv.2403.06833.
E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, and C. Lampert, “Can LLMs separate instructions from data? And what do we even mean by that?,” arXiv. 2024.
Zverev E, Abdelnabi S, Tabesh S, Fritz M, Lampert C. 2024. Can LLMs separate instructions from data? And what do we even mean by that? arXiv, 2403.06833.
Zverev, Egor, et al. “Can LLMs Separate Instructions from Data? And What Do We Even Mean by That?” ArXiv, 2403.06833, 2024, doi:10.48550/arXiv.2403.06833.
All files available under the following license(s):
Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0):
Main File(s)
File Name
2403.06833v3.pdf
530.97 KB
Access Level

Date Uploaded
2025-02-20
MD5 Checksum
35eb43968684b87be59144603ef10af0
Link(s) to Main File(s)
Access Level

Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2403.06833