Changyu Chen

I am a Ph.D. candidate in Computer Science at Singapore Management University (SMU), fortunate to be advised by Prof. Pradeep Varakantham and Prof. Arunesh Sinha. Prior to this, I earned my Master's degree in Systems and Project Management from Nanyang Technological University (NTU), and my Bachelor's degree in Civil Engineering from Zhejiang University.

I was fortunate to be a research intern at Sea AI lab (SAIL), mentored by Chao Du and Tianyu Pang, and working closely with Qian Liu and Min Lin.

My research interests lie in the intersection of generative modelling and autonomous decision making, with a current focus on the efficient and robust alignment of large language models.

Email  /  LinkedIn  /  Github  /  Google Scholar

profile photo
News
Mar, 2025 Release our new project: Understanding R1-Zero-Like Training: A Critical Perspective, revealing the bias in GRPO and proposing a debiased alternative, Dr. GRPO.
Feb, 2025 Release our new project: There May Not be Aha Moment in R1-Zero-like Training, the first systematic study of self-reflection on open base models, followed by an indepth analysis of R1-Zero-like training dynamics.
Jan, 2025 Recent work: Bootstrapping Language Models with DPO Implicit Rewards is accepted by ICLR 2025.
Jan, 2025 Recent work: Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning is accepted by NAACL 2025.
Dec, 2024 Recent work: On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression is accepted by AAMAS 2025.
Dec, 2024 Introduce Sailor2, a multilingual language model family, with a 20B chat model achieving a 50-50 win rate against GPT-4o in most SEA languages!
Oct, 2024 Recent work: Sample-Efficient Alignment for LLMs is accepted by NeurIPS 2024 Workshop on Language Gamification.
Jul, 2024 I am selected for the SMU Presidential Doctoral Fellowship and the SCIS Dean's List.
Dec, 2023 I received Singapore Data Science Consortium Dissertation Research Fellowship 2023 from Singapore Data Science Consortium.
Selected Research
Sample-Efficient Alignment for LLMs
Zichen Liu, Changyu Chen, Chao Du, Wee Sun Lee, Min Lin
LanGame @ Advances in Neural Information Processing Systems (LanGame @ NeurIPS), 2024
arXiv / code


Bootstrapping Language Models with DPO Implicit Rewards
Changyu Chen*, Zichen Liu*, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin
International Conference on Learning Representations (ICLR), 2025
MHFAIA @ International Conference on Machine Learning (MHFAIA @ ICML), 2024
arXiv / code


Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Wenjun Li, Changyu Chen, Pradeep Varakantham
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings), 2025
arXiv

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression
Zichang Ge*, Changyu Chen*, Arunesh Sinha, Pradeep Varakantham
International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2025
arXiv / code

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning
Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham
Advances in Neural Information Processing Systems (NeurIPS), 2023
project page / arXiv / code

Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models
Changyu Chen, Avinandan Bose, Shih-Fen Cheng, Arunesh Sinha,
Annual AAAI Conference on Artificial Intelligence (AAAI), 2022
arXiv / code

denotes corresponding author
* denotes equal contribution

Open-source Project
Online Alignment for LLMs (Oat 🌾)

Oat 🌾 is a training framework for LLMs with online alignment algorithms. It features a distributed Actor-Learner-Oracle architecture optimized for scalability and efficiency, integrating accelerated response sampling with vLLM, memory-efficient learning via DeepSpeed ZeRO, and dynamic Oracle services powered by Mosec. Oat supports cutting-edge online alignment algorithms such as SEA, APL, and XPO, alongside mainstream offline algorithms like DPO and SimPO. I was proud to contribute to Oat as part of its development team led by Kevin.


Thank Jon Barron for sharing his website's source code.