Post-training
Everything done to a model after pretraining — fine-tuning, RLHF, DPO, safety training — to turn a raw base model into a usable product.
In plain English
Post-training is the bundle of steps that take a base model (trained on raw next-token prediction over the web) and turn it into a model people can actually use. A base model is fluent but unhelpful; post-training is what makes Claude or GPT polite, useful, and refusing of obvious abuse.
Stages of post-training (typical):
- Supervised fine-tuning (SFT) — train on curated instruction-response pairs
- Reward modelling — train a model to score response quality
- RLHF or DPO — align the LLM to the reward signal / preference data
- Safety training — refuse-when-appropriate behaviour, jailbreak resistance
- Capability training — tool use, JSON output, agentic behaviours
- Evaluation and red-teaming — find failures, repeat from step 1
Why it dominates the gap between models: Two labs starting from similar pretraining recipes can produce wildly different end-products based on post-training. Most "this model feels better" judgments come down to post-training data quality and method.