About Me

I am a Ph.D. student in Computer Science & Engineering at UC San Diego, where I am fortunate to be advised by Prof. Hao Zhang. I previously obtained my B.S. in Computer Science from Zhejiang University.

My research focuses on distributed systems, machine learning systems, and efficient machine learning algorithms. Currently, my work focuses on designing optimized algorithms and systems for large language model (LLM) inference. Some of my recent projects include DeepConf, Lookahead Decoding, and Dynasor🦖.

Education

  • Ph.D in Computer Science, UCSD
  • B.S. in Computer Science, Zhejiang University, 2018-2022

Publications

See a full list on Google Scholar

Deep Think with Confidence

Yichao Fu, Xuewei Wang, Hao Zhang, Yuandong Tian, Jiawei Zhao Arxiv 2025 [paper] [website] [code]

Efficiently Scaling LLM Reasoning with Certaindex

Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Yonghao Zhuang, Yian Ma, Aurick Qiao, Tajana Rosing, Ion Stoica, Hao Zhang NeurIPS 2025 [paper] [blog] [code]

Scaling Speculative Decoding with Lookahead Reasoning

Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang NeurIPS 2025 [paper] [code]

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang ICML 2024 [paper] [blog] [code]

Efficient LLM Scheduling by Learning to Rank

Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang NeurIPS 2024 [paper] [code]

Shiftaddllm: Accelerating pretrained llms via post-training multiplication-less reparameterization

Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin NeurIPS 2024 [paper] [code]

When linear attention meets autoregressive decoding: Towards more effective and efficient linearized large language models

Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin ICML 2024 [paper] [code]

Internship

  • Summer 2025: Research Scientist Intern
  • Summer 2021: Game Engine Developer
    • Game Engine Group of Aurora Studios, Tencent
    • Duties includes: Development of a fabric editor – implementing cloth rendering using GPU shaders and managing collision processing

Skills

  • Programming Languages: C++, Python, CUDA, Java
  • Tools: Linux, Git, LATEX, PyTorch, Jax

Service

  • NeurIPS’24, ICLR’25, ICML’25, Neurips’25 Reviewer
  • OSDI’23, USENIX ATC’23 Artifact Evaluation Committee