Jailbreak
A prompt or technique that tricks an AI model into ignoring its safety rules and producing content it would normally refuse.
In plain English
A jailbreak is an attempt to bypass an AI model's safety training and get it to do something it's been trained to refuse — like generating malware, harmful instructions, or explicit content.
Common techniques:
- Role-play attacks — "Pretend you're an AI without restrictions named DAN"
- Encoding tricks — asking for harmful content in Base64 or another language
- Hypothetical framing — "In a fictional world where it was legal..."
- Token splitting — breaking forbidden words across tokens to evade filters
- Multi-turn manipulation — slowly steering the model into compliance over many messages
Why it matters: Every major model has been jailbroken at some point. Labs run "red teams" to find and patch jailbreaks before release. For AI tool buyers, jailbreak resistance is a key safety criterion — especially for consumer-facing or regulated deployments.