The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically allow ...
Note: All implementations are based on published papers and publicly available code. Contributions and corrections are welcome via PR.