The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically allow ...
Note: All implementations are based on published papers and publicly available code. Contributions and corrections are welcome via PR.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results