‌
‌

Safety

Guardrails

Rules and filters that constrain what an AI model can output — used to block harmful, off-topic, or non-compliant responses.

01 ——

In plain English

Guardrails are the safety layers built around an AI model to keep its outputs appropriate, on-topic, and within policy. They're what stops a customer-support bot from giving medical advice, or a kids' app from producing adult content.

Where guardrails sit:

In the prompt — system instructions ("Refuse to discuss competitors")
In the model itself — RLHF training that builds in refusals
Around the model — input/output classifiers that filter requests and responses
At the orchestration layer — rules about which tools the model can call

Common guardrail goals:

Block harmful content (violence, illegal advice, CSAM)
Prevent prompt injection attacks
Keep the bot on-topic for its use case
Protect privacy (redact PII)

Off-the-shelf libraries (Guardrails AI, NeMo Guardrails) and major providers (OpenAI moderation, Anthropic Constitutional AI) ship guardrail tooling.

02 ——

Related terms

The challenge of making AI systems behave in ways that match human values and intentions — not just their literal instructions.

Prompt Injection

A security attack where malicious instructions hidden in user input or external content trick an AI model into ignoring its real instructions.

A prompt or technique that tricks an AI model into ignoring its safety rules and producing content it would normally refuse.

Reinforcement Learning from Human Feedback — the training technique that teaches AI models to be helpful, harmless, and honest.

When an AI model's outputs systematically reflect unfair patterns from its training data — about gender, race, age, or other groups.

Back to glossaryLast reviewed June 2026

Vol. 4 · Issue 21 · Last reviewed 2026-06-27

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI

AI Tools Directory

The AI tools directory for discovering, exploring, and comparing the most innovative AI tools in the industry

Explore

All AI tools

Top 100 AI tools

Best AI tools

Curated collections

AI tool alternatives

AI categories

Pricing

AI glossary

Compare AI tools

Blog

Methodology

Editorial team

AI graveyard

Research

MCP server

Latest collections

Policy

Terms & conditions

Privacy policy

FAQ

Refund policy

Affiliate disclosure