Watermarking
Embedding a hidden, machine-detectable signal in AI-generated content so it can later be identified as AI-made.
In plain English
Watermarking is the technique of marking AI outputs — text, images, audio, video — with a hidden signal that lets you (or a detector) verify later that the content was AI-generated. It's one of the main proposed defences against misuse of generative AI.
How it works:
- Text watermarking — biases the model toward certain tokens in a statistical pattern undetectable to humans but detectable to the watermark verifier (Google's SynthID-Text, OpenAI's research)
- Image watermarking — embed imperceptible perturbations in pixel patterns (SynthID-Image, Stable Signature, Adobe Content Credentials)
- Audio / video — similar perturbation in spectrograms or frames
- Metadata — C2PA standard adds signed provenance to file headers
Where it's deployed:
- Google SynthID — applied to Imagen, Veo, MusicLM, Gemini text outputs
- Adobe Content Credentials (C2PA) — Photoshop, Firefly
- Meta — applied to AI-generated imagery on their platforms
- OpenAI — DALL-E images include C2PA metadata
Limits: Watermarks can be stripped by re-screenshotting, re-encoding, or simply paraphrasing text. Watermarking buys provenance for honest actors but doesn't stop motivated bad actors.