Transformer
The neural network architecture introduced in 2017 that powers nearly every modern LLM, image generator, and AI breakthrough.
In plain English
The Transformer is a neural network architecture introduced by Google researchers in the 2017 paper "Attention is All You Need." It became the foundation for nearly every major AI breakthrough since: GPT, BERT, Claude, Gemini, Stable Diffusion, and most others all use transformers.
Why transformers won:
- Attention mechanism — the model decides which parts of the input matter most for each output token
- Parallelisation — earlier architectures (RNNs) processed text one word at a time; transformers process the whole input at once, training much faster
- Scalability — quality keeps improving as you scale up data, parameters, and compute
Variants:
- Decoder-only — GPT, Claude, Llama (used for chat and generation)
- Encoder-only — BERT (used for classification and search)
- Encoder-decoder — T5, BART (used for translation and summarisation)
- Vision Transformers (ViT) — used in computer vision
The transformer is the most important AI invention of the last decade.