Why watermarking AI-generated content won’t guarantee trust online

Further complicating matters, watermarking is often used as a “catch-all” term for the general act of providing content disclosures, even though there are many methods. A closer read of the White House commitments describes another method for disclosure known as provenance, which relies on cryptographic signatures, not invisible signals. However, this is often described as watermarking in the popular press. If you find this mish-mash of terms confusing, rest assured you’re not the only one. But clarity matters: the AI sector cannot implement consistent and robust transparency measures if there is not even agreement on how we refer to the different techniques.

I’ve come up with six initial questions that could help us evaluate the usefulness of watermarks and other disclosure methods for AI. These should help make sure different parties are discussing the exact same thing, and that we can evaluate each method in a thorough, consistent manner.

Can the watermark itself be tampered with?

Ironically, the technical signals touted as helpful for gauging where content comes from and how it is manipulated can sometimes be manipulated themselves. While it’s difficult, both invisible and visible watermarks can be removed or altered, rendering them useless for telling us what is and isn’t synthetic. And notably, the ease with which they can be manipulated varies according to what type of content you’re dealing with.

Is the watermark’s durability consistent for different content types?

While invisible watermarking is often promoted as a broad solution for dealing with generative AI, such embedded signals are much more easily manipulated in text than in audiovisual content. That likely explains why the White House’s summary document suggests that watermarking would be applied to all types of AI, but in the full text it’s made clear that companies only committed to disclosures for audiovisual material. AI policymaking must therefore be specific about how disclosure techniques like invisible watermarking vary in their durability and broader technical robustness across different content types. One disclosure solution may be great for images, but useless for text.

Who can detect these invisible signals?

Even if the AI sector agrees to implement invisible watermarks, deeper questions are inevitably going to emerge around who has the capacity to detect these signals and eventually make authoritative claims based on them. Who gets to decide whether content is AI-generated, and perhaps as an extension, whether it is misleading? If everyone can detect watermarks, that might render them susceptible to misuse by bad actors. On the other hand, controlled access to detection of invisible watermarks—especially if it is dictated by large AI companies—might degrade openness and entrench technical gatekeeping. Implementing these sorts of disclosure methods without working out how they’re governed could leave them distrusted and ineffective. And if the techniques are not widely adopted, bad actors might turn to open-source technologies that lack the invisible watermarks to create harmful and misleading content.

Do watermarks preserve privacy?

As key work from Witness, a human rights and technology group, makes clear, any tracing system that travels with a piece of content over time might also introduce privacy issues for those creating the content. The AI sector must ensure that watermarks and other disclosure techniques are designed in a manner that does not include identifying information that might put creators at risk. For example, a human rights defender might capture abuses through photographs that are watermarked with identifying information, making the person an easy target for an authoritarian government. Even the knowledge that watermarks could reveal an activist’s identity might have chilling effects on expression and speech. Policymakers must provide clearer guidance on how disclosures can be designed so as to preserve the privacy of those creating content, while also including enough detail to be useful and practical.

Do visible disclosures help audiences understand the role of generative AI?

Even if invisible watermarks are technically durable and privacy preserving, they might not help audiences interpret content. Though direct disclosures like visible watermarks have an intuitive appeal for providing greater transparency, such disclosures do not necessarily achieve their intended effects, and they can often be perceived as paternalistic, biased, and punitive, even when they are not saying anything about the truthfulness of a piece of content. Furthermore, audiences might misinterpret direct disclosures. A participant in my 2021 research misinterpreted Twitter’s “manipulated media” label as suggesting that the institution of “the media” was manipulating him, not that the content of the specific video had been edited to mislead. While research is emerging on how different user experience designs affect audience interpretation of content disclosures, much of it is concentrated within large technology companies and focused on distinct contexts, like elections. Studying the efficacy of direct disclosures and user experiences, and not merely relying on the visceral appeal of labeling AI-generated content, is vital to effective policymaking for improving transparency.

Could visibly watermarking AI-generated content diminish trust in “real” content?

Perhaps the thorniest societal question to evaluate is how coordinated, direct disclosures will affect broader attitudes toward information and potentially diminish trust in “real” content. If AI organizations and social media platforms are simply labeling the fact that content is AI-generated or modified—as an understandable, albeit limited, way to avoid making judgments about which claims are misleading or harmful—how does this affect the way we perceive what we see online?

Categories