Safety

Data Poisoning

An attack that corrupts a model’s training data to make it behave incorrectly — either degrading performance or installing hidden backdoors.

01 ——

In plain English

Data poisoning is an attack on the AI supply chain: an attacker injects bad data into a training set so the resulting model behaves in a way they want. Because frontier models scrape much of the public web, the threat is concrete and growing.

Two main types:

  • Availability attacks — degrade the model's general performance (subtle quality drops across many tasks)
  • Backdoor attacks — make the model behave normally except when triggered by a specific phrase, image, or pattern — at which point it produces malicious output

Real-world surface area:

  • Web-scraped pretraining data (anyone with a website can try)
  • Open datasets (Common Crawl, LAION, Stack Overflow dumps)
  • User-uploaded fine-tuning data
  • RAG corpora (a poisoned doc in a vector DB can hijack answers)

Defences: Data filtering, provenance tracking, training-data deduplication, and post-hoc red teaming. No defence is foolproof; this is an active research area.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI