Training

Post-training

Everything done to a model after pretraining — fine-tuning, RLHF, DPO, safety training — to turn a raw base model into a usable product.

01 ——

In plain English

Post-training is the bundle of steps that take a base model (trained on raw next-token prediction over the web) and turn it into a model people can actually use. A base model is fluent but unhelpful; post-training is what makes Claude or GPT polite, useful, and refusing of obvious abuse.

Stages of post-training (typical):

Supervised fine-tuning (SFT) — train on curated instruction-response pairs
Reward modelling — train a model to score response quality
RLHF or DPO — align the LLM to the reward signal / preference data
Safety training — refuse-when-appropriate behaviour, jailbreak resistance
Capability training — tool use, JSON output, agentic behaviours
Evaluation and red-teaming — find failures, repeat from step 1

Why it dominates the gap between models: Two labs starting from similar pretraining recipes can produce wildly different end-products based on post-training. Most "this model feels better" judgments come down to post-training data quality and method.

02 ——