Tweet by Jay Hack
@mathemagic1an
Speculative Sampling: Accelerating Text Generation
๐งต The TL;DR
DeepMind has developed a way to use a smaller/faster model to generate K (potentailly obvious) tokens that can be checked by a slower/smarter model, resulting in a 2x+ speedup on natural language generation.
|
๐ Key Points |
๐ฅ Key People |