About Me

I am a Ph.D. student in Computer Science & Engineering at UC San Diego, where I am fortunate to be advised by Prof. Hao Zhang. I previously obtained my B.S. in Computer Science from Zhejiang University.

My research focuses on distributed systems, machine learning systems, and efficient machine learning algorithms. Currently, my work focuses on designing optimized algorithms and systems for large language model (LLM) inference. Some of my recent projects include Lookahead Decoding, vllm-ltr, and Dynasor🦖.

Education

  • Ph.D in Computer Science, UCSD, 2029 (expected)
  • B.S. in Computer Science, Zhejiang University, 2022

Publications

See a full list on Google Scholar

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang ICML 2024 [paper] [blog] [code]

Efficient LLM Scheduling by Learning to Rank

Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang NeurIPS 2024 [paper] [code]

Shiftaddllm: Accelerating pretrained llms via post-training multiplication-less reparameterization

Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin NeurIPS 2024 [paper] [code]

When linear attention meets autoregressive decoding: Towards more effective and efficient linearized large language models

Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin ICML 2024 [paper] [code]

Internship

  • Summer 2021: Game Engine Developer
    • Game Engine Group of Aurora Studios, Tencent
    • Duties includes: Development of a fabric editor – implementing cloth rendering using GPU shaders and managing collision processing

Skills

  • Programming Languages: C++, Python, CUDA, Java
  • Tools: Linux, Git, LATEX, PyTorch, Jax

Service

  • NeurIPS’24, ICLR’25, ICML’25, Neurips’25 Reviewer
  • OSDI’23, USENIX ATC’23 Artifact Evaluation Committee