Indirect Prompt Injection Threats

Large Language Models (LLM) have made amazing progress in recent years. Most recently, they have demonstrated to answer natural language questions at a surprising performance level. In addition, by clever prompting, these models can change their behavior. In this way, these models blur the line between data and instruction. From "traditional" cybersecurity, we know that this is a problem. The importance of security boundaries between trusted and untrusted inputs for LLMs was underestimated. We show that Prompt Injection is a serious security threat that needs to be addressed as models are deployed to new use-cases and interface with more systems.

If allowed by the user, Bing Chat can see currently open websites. We show that an attacker can plant an injection in a website the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information. The user doesn't have to ask about the website or do anything except interact with Bing Chat while the website is opened in the browser.

Turning Bing Chat into a scammer trying to get the user's payment details
Microsoft prevents content from GitHub pages domains from being ingested by Bing Chat at the present time.

Turning Bing Chat into a Data Pirate

This demonstration on Bing Chat is only a small part of new attack techniques presented in our recent paper (linked below).

A user opened a prepared website containing an injection (could also be on a social media site) in Edge. You can see the conversation the user had with Bing Chat while the tab was open. The website includes a prompt which is read by Bing and changes its behavior to access user information and send it to an attacker. This is an example of "Indirect Prompt Injection", a new attack described in our paper. The pirate accent is optional. The injection itself is simply a piece of regular text that has fontsize 0. You can find an image of the injected text below, too (otherwise Bing Chat could see it and could be injected). you can inspect the actual website that is opened here.

The prompt hidden on the pirate website