Training

Post-training

Everything done to a model after pretraining — fine-tuning, RLHF, DPO, safety training — to turn a raw base model into a usable product.

01 ——

In plain English

Post-training is the bundle of steps that take a base model (trained on raw next-token prediction over the web) and turn it into a model people can actually use. A base model is fluent but unhelpful; post-training is what makes Claude or GPT polite, useful, and refusing of obvious abuse.

Stages of post-training (typical):

  1. Supervised fine-tuning (SFT) — train on curated instruction-response pairs
  2. Reward modelling — train a model to score response quality
  3. RLHF or DPO — align the LLM to the reward signal / preference data
  4. Safety training — refuse-when-appropriate behaviour, jailbreak resistance
  5. Capability trainingtool use, JSON output, agentic behaviours
  6. Evaluation and red-teaming — find failures, repeat from step 1

Why it dominates the gap between models: Two labs starting from similar pretraining recipes can produce wildly different end-products based on post-training. Most "this model feels better" judgments come down to post-training data quality and method.

02 ——

Related terms

Back to glossaryLast reviewed May 2026
Vol. 4 · Issue 19 · Last reviewed 2026-05-30

Sign up for our newsletter

Receive weekly updates so you can stay up-to-date with the world of AI