
Tweet by Andrej Karpathy
@karpathy
Exploring the Transformer Model with GPT-3
๐งต The TL;DR
This tweet thread explores the Transformer model, a deep learning architecture used in NLP. It introduces the key attention mechanism and builds up the Transformer with various components. A 10M parameter model is then trained, compared to OpenAI's GPT-3 and ChatGPT, and sampled from to generate fake Shakespeare.
|
๐ Key Points |