Safety

Jailbreak

A prompt or technique that tricks an AI model into ignoring its safety rules and producing content it would normally refuse.

01 ——

In plain English

A jailbreak is an attempt to bypass an AI model's safety training and get it to do something it's been trained to refuse — like generating malware, harmful instructions, or explicit content.

Common techniques:

  • Role-play attacks — "Pretend you're an AI without restrictions named DAN"
  • Encoding tricks — asking for harmful content in Base64 or another language
  • Hypothetical framing — "In a fictional world where it was legal..."
  • Token splitting — breaking forbidden words across tokens to evade filters
  • Multi-turn manipulation — slowly steering the model into compliance over many messages

Why it matters: Every major model has been jailbroken at some point. Labs run "red teams" to find and patch jailbreaks before release. For AI tool buyers, jailbreak resistance is a key safety criterion — especially for consumer-facing or regulated deployments.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI