Safety

Watermarking

Embedding a hidden, machine-detectable signal in AI-generated content so it can later be identified as AI-made.

01 ——

In plain English

Watermarking is the technique of marking AI outputs — text, images, audio, video — with a hidden signal that lets you (or a detector) verify later that the content was AI-generated. It's one of the main proposed defences against misuse of generative AI.

How it works:

Text watermarking — biases the model toward certain tokens in a statistical pattern undetectable to humans but detectable to the watermark verifier (Google's SynthID-Text, OpenAI's research)
Image watermarking — embed imperceptible perturbations in pixel patterns (SynthID-Image, Stable Signature, Adobe Content Credentials)
Audio / video — similar perturbation in spectrograms or frames
Metadata — C2PA standard adds signed provenance to file headers

Where it's deployed:

Google SynthID — applied to Imagen, Veo, MusicLM, Gemini text outputs
Adobe Content Credentials (C2PA) — Photoshop, Firefly
Meta — applied to AI-generated imagery on their platforms
OpenAI — DALL-E images include C2PA metadata

Limits: Watermarks can be stripped by re-screenshotting, re-encoding, or simply paraphrasing text. Watermarking buys provenance for honest actors but doesn't stop motivated bad actors.

02 ——