This newsletter made possible by MosaicML. Accelerating Transformer Inference for Translation via Parallel Decoding They speed up inference in autoregressive text models without changing the models or requiring any extra training. They do this by generating tokens in parallel, rather than sequentially.
2023-5-21 arXiv roundup: Parallel transformers, Optimized data mixtures, Don't trust LLM chains-of-thought
2023-5-21 arXiv roundup: Parallel…
2023-5-21 arXiv roundup: Parallel transformers, Optimized data mixtures, Don't trust LLM chains-of-thought
This newsletter made possible by MosaicML. Accelerating Transformer Inference for Translation via Parallel Decoding They speed up inference in autoregressive text models without changing the models or requiring any extra training. They do this by generating tokens in parallel, rather than sequentially.