The Evolution of GPT
The Evolution of GPT
GPT (1): Improving Language Understanding by Generative Pre-Training
- Simple transformer architecture.
- Pre-trained to perform next word prediction.
- For downstream tasks, define input sequence pattern, fine-tune with additional, final linear layer.
GPT-2: Language Models are Unsupervised Multitask Learners
-
Standard language modelling & use of transformers (like from Attention is All you Need).
-
A general system should be able to handle many tasks. Condition output on input and task. I.e. instead of $p(output input)$ (perhaps per-task), consider $p(output input, task)$.
GPT-3: Language Models are Few-Shot Learners
- “While typically task-agnostic in architecture, this method _[Generative pre-training] still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions – something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine- tuning approaches.”_
- In essence, large enough LLMs can perform on zero/few-shot tasks at SOTA
InstructGPT: Training language models to follow instructions with human feedback
- Introduces new RL technique: Reinforcement Learning from Human Feedback (RLHF).
- Results show human preferred outputs from 1.3B parameters InstructGPT over 175B GPT-3.