Changyu Chen

I am a Ph.D. candidate in Computer Science at Singapore Management University (SMU), fortunate to be advised by Prof. Pradeep Varakantham and Prof. Arunesh Sinha. Prior to this, I earned my Master's degree in Systems and Project Management from Nanyang Technological University (NTU), and my Bachelor's degree in Civil Engineering from Zhejiang University.

I was fortunate to be a research intern at Sea AI lab (SAIL), mentored by Chao Du and Tianyu Pang, and working closely with Qian Liu and Min Lin.

My research interests lie in the intersection of generative modelling and autonomous decision making, with a current focus on the efficient and robust alignment of large language models.

Email / LinkedIn / Github / Google Scholar

News

Mar, 2025	Release our new project: Understanding R1-Zero-Like Training: A Critical Perspective, revealing the bias in GRPO and proposing a debiased alternative, Dr. GRPO.
Feb, 2025	Release our new project: There May Not be Aha Moment in R1-Zero-like Training, the first systematic study of self-reflection on open base models, followed by an indepth analysis of R1-Zero-like training dynamics.
Jan, 2025	Recent work: Bootstrapping Language Models with DPO Implicit Rewards is accepted by ICLR 2025.
Jan, 2025	Recent work: Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning is accepted by NAACL 2025.
Dec, 2024	Recent work: On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression is accepted by AAMAS 2025.
Dec, 2024	Introduce Sailor2, a multilingual language model family, with a 20B chat model achieving a 50-50 win rate against GPT-4o in most SEA languages!
Oct, 2024	Recent work: Sample-Efficient Alignment for LLMs is accepted by NeurIPS 2024 Workshop on Language Gamification.
Jul, 2024	I am selected for the SMU Presidential Doctoral Fellowship and the SCIS Dean's List.
Dec, 2023	I received Singapore Data Science Consortium Dissertation Research Fellowship 2023 from Singapore Data Science Consortium.

Selected Research

	Sample-Efficient Alignment for LLMs Zichen Liu, Changyu Chen, Chao Du^†, Wee Sun Lee, Min Lin LanGame @ Advances in Neural Information Processing Systems (LanGame @ NeurIPS), 2024 arXiv / code
	Bootstrapping Language Models with DPO Implicit Rewards Changyu Chen^, Zichen Liu^, Chao Du^†, Tianyu Pang, Qian Liu, Arunesh Sinha^†, Pradeep Varakantham^†, Min Lin International Conference on Learning Representations (ICLR), 2025 MHFAIA @ International Conference on Machine Learning (MHFAIA @ ICML), 2024 arXiv / code
	Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning Wenjun Li, Changyu Chen, Pradeep Varakantham Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings), 2025 arXiv
	On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression Zichang Ge^, Changyu Chen^, Arunesh Sinha, Pradeep Varakantham International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2025 arXiv / code
	Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham Advances in Neural Information Processing Systems (NeurIPS), 2023 project page / arXiv / code
	Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models Changyu Chen, Avinandan Bose, Shih-Fen Cheng, Arunesh Sinha, Annual AAAI Conference on Artificial Intelligence (AAAI), 2022 arXiv / code

^† denotes corresponding author
^* denotes equal contribution

Open-source Project

Online Alignment for LLMs (Oat 🌾)

Oat 🌾 is a training framework for LLMs with online alignment algorithms. It features a distributed Actor-Learner-Oracle architecture optimized for scalability and efficiency, integrating accelerated response sampling with vLLM, memory-efficient learning via DeepSpeed ZeRO, and dynamic Oracle services powered by Mosec. Oat supports cutting-edge online alignment algorithms such as SEA, APL, and XPO, alongside mainstream offline algorithms like DPO and SimPO. I was proud to contribute to Oat as part of its development team led by Kevin.

Thank Jon Barron for sharing his website's source code.