OpenAI unveiled new ChatGPT features that include the ability to have a conversation with the chatbot as if you were making a call, allowing you to instantly get responses to your spoken questions in a lifelike synthetic voice, as my colleague Will Douglas Heaven reported. OpenAI also revealed that ChatGPT will be able to search the web.
Google’s rival bot, Bard, is plugged into most of the company’s ecosystem, including Gmail, Docs, YouTube, and Maps. The idea is that people will be able to use the chatbot to ask questions about their own content—for example, by getting it to search through their emails or organize their calendar. Bard will also be able to instantly retrieve information from Google Search. In a similar vein, Meta too announced that it is throwing AI chatbots at everything. Users will be able to ask AI chatbots and celebrity AI avatars questions on WhatsApp, Messenger, and Instagram, with the AI model retrieving information online from Bing search.
This is a risky bet, given the limitations of the technology. Tech companies have not solved some of the persistent problems with AI language models, such as their propensity to make things up or “hallucinate.” But what concerns me the most is that they are a security and privacy disaster, as I wrote earlier this year. Tech companies are putting this deeply flawed tech in the hands of millions of people and allowing AI models access to sensitive information such as their emails, calendars, and private messages. In doing so, they are making us all vulnerable to scams, phishing, and hacks on a massive scale.
I’ve covered the significant security problems with AI language models before. Now that AI assistants have access to personal information and can simultaneously browse the web, they are particularly prone to a type of attack called indirect prompt injection. It’s ridiculously easy to execute, and there is no known fix.
In an indirect prompt injection attack, a third party “alters a website by adding hidden text that is meant to change the AI’s behavior,” as I wrote in April. “Attackers could use social media or email to direct users to websites with these secret prompts. Once that happens, the AI system could be manipulated to let the attacker try to extract people’s credit card information, for example.” With this new generation of AI models plugged into social media and emails, the opportunities for hackers are endless.
I asked OpenAI, Google, and Meta what they are doing to defend against prompt injection attacks and hallucinations. Meta did not reply in time for publication, and OpenAI did not comment on the record.
Regarding AI’s propensity to make things up, a spokesperson for Google did say the company was releasing Bard as an “experiment,” and that it lets users fact-check Bard’s answers using Google Search. “If users see a hallucination or something that isn’t accurate, we encourage them to click the thumbs-down button and provide feedback. That’s one way Bard will learn and improve,” the spokesperson said. Of course, this approach puts the onus on the user to spot the mistake, and people have a tendency to place too much trust in the responses generated by a computer. Google did not have an answer for my question about prompt injection.
For prompt injection, Google confirmed it is not a solved problem and remains an active area of research. The spokesperson said the company is using other systems, such as spam filters, to identify and filter out attempted attacks, and is conducting adversarial testing and red teaming exercises to identify how malicious actors might attack products built on language models. “We’re using specially trained models to help identify known malicious inputs and known unsafe outputs that violate our policies,” the spokesperson said.