Twitter profile picture of Jay Hack
Tweet by Jay Hack @mathemagic1an

Speculative Sampling: Accelerating Text Generation


๐Ÿงต The TL;DR

DeepMind has developed a way to use a smaller/faster model to generate K (potentailly obvious) tokens that can be checked by a slower/smarter model, resulting in a 2x+ speedup on natural language generation.


    ๐Ÿ”‘ Key Points

  • DeepMind has developed a technique called Speculative Sampling
  • This technique uses a small/fast model to quickly generate K (potentially obvious) tokens
  • The slower/smarter model checks the work of the small model, resulting in a 2x+ speedup on natural language generation

View Tweet Thread