About Me

I am a Ph.D. student in Computer Science & Engineering at UC San Diego, where I am fortunate to be advised by Prof. Hao Zhang. I previously obtained my B.S. in Computer Science from Zhejiang University.

My research focuses on distributed systems, machine learning systems, and efficient machine learning algorithms. Currently, my work focuses on designing optimized algorithms and systems for large language model (LLM) inference. Some of my recent projects include DeepConf, Lookahead Decoding, and Dynasor🦖.

Education

  • Ph.D in Computer Science, UCSD
  • B.S. in Computer Science, Zhejiang University, 2018-2022

Publications

Full list on Google Scholar.

Selected publications
Deep Think with Confidence
Yichao Fu, Xuewei Wang, Hao Zhang, Yuandong Tian, Jiawei Zhao
Scaling Speculative Decoding with Lookahead Reasoning
Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang
Efficiently Scaling LLM Reasoning with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Yonghao Zhuang, Yian Ma, Aurick Qiao, Tajana Rosing, Ion Stoica, Hao Zhang
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang
Efficient LLM Scheduling by Learning to Rank
Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang
Show all publicationsHide additional publications
FastKernels: Benchmarking GPU Kernel Generation in Production
Gabriele Oliaro, Yichao Fu, May Jiang, Owen Lu, Junli Wang, Zhihao Jia, Hao Zhang, Samyam Rajbhandari
Internalizing Agency from Reflective Experience
Rui Ge, Yichao Fu, Yuyang Qian, Junda Su, Yiming Zhao, Peng Zhao, Hao Zhang
When Drafts Evolve: Speculative Decoding Meets Online Learning
Yu-Yang Qian, Hao-Cong Wu, Yichao Fu, Hao Zhang, Peng Zhao
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
Lanxiang Hu, Siqi Kou, Yichao Fu, Samyam Rajbhandari, Tajana Rosing, Yuxiong He, Zhijie Deng, Hao Zhang
Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing
Yichao Fu, Junda Chen, Yonghao Zhuang, Zheyu Fu, Ion Stoica, Hao Zhang
FoldMoE: Efficient Long Sequence MoE Training via Attention-MoE Pipelining
Guichao Zhu, Lintian Lei, Yuhao Qing, Yichao Fu, Fanxin Li, Dong Huang, Zekai Sun, Heming Cui
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin
Neuron Sensitivity-Guided Test Case Selection
Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Xiaofei Xie, Junjie Chen, Heming Cui
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin

Internship

  • Summer 2025: Research Scientist Intern
  • Summer 2021: Game Engine Developer
    • Game Engine Group of Aurora Studios, Tencent
    • Duties includes: Development of a fabric editor – implementing cloth rendering using GPU shaders and managing collision processing

Skills

  • Programming Languages: C++, Python, CUDA, Java
  • Tools: Linux, Git, LATEX, PyTorch, Jax

Service

  • NeurIPS’24, ICLR’25, ICML’25, NeurIPS’25 Reviewer
  • OSDI’23, USENIX ATC’23 Artifact Evaluation Committee