Large Language Models (LLM) have made amazing progress in recent years. Most recently, they have demonstrated to answer natural language questions at a surprising performance level. In addition, by clever prompting, these models can change their behavior. In this way, these models blur the line between data and instruction. From "traditional" cybersecurity, we know that this is a problem. The importance of security boundaries between trusted and untrusted inputs for LLMs was underestimated. We show that Prompt Injection is a serious security threat that needs to be addressed as models are deployed to new use-cases and interface with more systems.
If allowed by the user, Bing Chat can see currently open websites. We show that an attacker can plant an injection in a
website
the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information.
The user doesn't have to ask about the website or do anything except interact with Bing Chat while the
website is opened in the browser.
Microsoft has implemented various mitigations against this threat now, though their effectiveness remains unclear and is constantly changing.
Turning Bing Chat into a Data Pirate
This demonstration on Bing Chat is only a small part of new attack techniques presented in our recent paper (linked below).
A user opened a prepared website containing an injection (could also be on a social media site) in
Edge.
You can see the conversation the user had with Bing Chat while the tab was open.
The website includes a prompt which is read by Bing and changes its behavior to access user information
and send it to an attacker.
This is an example of "Indirect Prompt Injection", a new attack described in our paper.
The pirate accent is optional. The injection itself is simply a piece of regular text that has fontsize
0. You can find an image of the injected text below, too (otherwise Bing Chat could see it and could be
injected).
you can inspect the actual website that is opened here.