Microsoft knows you love tricking its AI chatbots into doing weird stuff and it’s designing ‘prompt shields’ to stop you

Microsoft Corp. is trying to make it harder for people to trick artificial intelligence chatbots into doing weird things. 

New safety features are being built into Azure AI Studio which lets developers build customized AI assistants using their own data, the Redmond, Washington-based company said in a blog post on Thursday. 

The tools include “prompt shields,” which are designed to detect and block deliberate attempts — also known as prompt injection attacks or jailbreaks  — to make an AI model behave in an unintended way. Microsoft is also addressing “indirect prompt injections,” when hackers insert malicious instructions into the data a model is trained on and trick it into performing such unauthorized actions as stealing user information or hijacking a system. 

Such attacks are “a unique challenge and threat,” said Sarah Bird, Microsoft’s chief product officer of responsible AI. The new defenses are designed to spot suspicious inputs and block them in real time, she said. Microsoft is also rolling out a feature that alerts users when a model makes things up or generates erroneous responses.

Microsoft is keen to boost trust in its generative AI tools, which are now being used by consumers and corporate customers alike. In February, the company investigated incidents involving its Copilot chatbot, which was generating responses that ranged from weird to harmful. After reviewing the incidents, Microsoft said users had deliberately tried to fool Copilot into generating the responses.

“Certainly we see it increasing as there’s more use of the tools but also as more people are aware of these different techniques,” Bird said. Tell-tale signs of such attacks include asking a chatbot a question multiple times or prompts that describe role-playing. 

Microsoft is OpenAI’s largest investor and has made the partnership a key part of its AI strategy. Bird said Microsoft and OpenAI are dedicated to deploying AI safely and building protections into the large language models underlying generative AI. 

“However, you can’t rely on the model alone,” she said. “These jailbreaks for example, are an inherent weakness of the model technology.” 

Subscribe to the Eye on AI newsletter to stay abreast of how AI is shaping the future of business. Sign up for free.

Click Here To Read More

Related posts