Publications
See a full list on Google Scholar
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang ICML 2024 [paper] [blog] [code]
Efficient LLM Scheduling by Learning to Rank
Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang NeurIPS 2024 [paper] [code]
Shiftaddllm: Accelerating pretrained llms via post-training multiplication-less reparameterization
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin NeurIPS 2024 [paper] [code]
When linear attention meets autoregressive decoding: Towards more effective and efficient linearized large language models
Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin ICML 2024 [paper] [code]