Mar, 2025 |
Release our new project: Understanding R1-Zero-Like Training: A Critical Perspective,
revealing the bias in GRPO and proposing a debiased alternative, Dr. GRPO. |
Feb, 2025 |
Release our new project: There May Not be Aha Moment in R1-Zero-like Training,
the first systematic study of self-reflection on open base models, followed by an indepth analysis of
R1-Zero-like training dynamics. |
Jan, 2025 |
Recent work: Bootstrapping Language Models with DPO Implicit Rewards
is accepted by ICLR 2025. |
Jan, 2025 |
Recent work: Unlocking Large Language Model's Planning Capabilities
with Maximum Diversity Fine-tuning is accepted by NAACL 2025. |
Dec, 2024 |
Recent work: On Learning Informative
Trajectory Embeddings for Imitation, Classification and Regression is accepted by AAMAS 2025.
|
Dec, 2024 |
Introduce Sailor2, a multilingual language model family,
with a 20B chat model achieving a 50-50 win rate against GPT-4o in most SEA languages!
|
Oct, 2024 |
Recent work: Sample-Efficient Alignment for LLMs is accepted by
NeurIPS 2024 Workshop on Language Gamification. |
Jul, 2024 |
I am selected for the SMU Presidential Doctoral Fellowship
and the SCIS Dean's List. |
Dec, 2023 |
I received Singapore
Data Science Consortium Dissertation
Research Fellowship 2023 from Singapore Data Science Consortium. |
|
|
|
|
|
Sample-Efficient Alignment for LLMs
Zichen Liu,
Changyu Chen,
Chao Du†,
Wee Sun Lee,
Min Lin
LanGame @ Advances in Neural Information Processing Systems (LanGame @
NeurIPS), 2024
arXiv /
code
|
|
Bootstrapping Language Models with DPO Implicit Rewards
Changyu Chen*,
Zichen Liu*,
Chao Du†,
Tianyu Pang,
Qian Liu,
Arunesh Sinha†,
Pradeep Varakantham†,
Min Lin
International Conference on Learning Representations (ICLR), 2025
MHFAIA @ International Conference on Machine Learning (MHFAIA @ ICML), 2024
arXiv /
code
|
|
Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning
Wenjun Li,
Changyu Chen,
Pradeep Varakantham
Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings), 2025
arXiv
|
|
On Learning Informative Trajectory Embeddings for Imitation, Classification and
Regression
Zichang Ge*,
Changyu Chen*,
Arunesh Sinha,
Pradeep Varakantham
International Conference on Autonomous Agents and Multiagent Systems
(AAMAS), 2025
arXiv /
code
|
|
Generative Modelling of Stochastic Actions with Arbitrary Constraints in
Reinforcement
Learning
Changyu Chen,
Ramesha Karunasena,
Thanh Hong Nguyen,
Arunesh Sinha,
Pradeep Varakantham
Advances in Neural Information Processing Systems (NeurIPS), 2023
project page /
arXiv /
code
|
|
Multiscale Generative Models: Improving Performance of a Generative Model Using
Feedback
from Other Dependent Generative Models
Changyu Chen,
Avinandan Bose,
Shih-Fen Cheng,
Arunesh Sinha,
Annual AAAI Conference on Artificial Intelligence (AAAI), 2022
arXiv /
code
|
† denotes corresponding author
* denotes equal contribution
|
|
Online Alignment for LLMs (Oat 🌾)
Oat 🌾 is a training framework for LLMs with online alignment algorithms. It features a distributed
Actor-Learner-Oracle architecture optimized for scalability and efficiency, integrating accelerated
response sampling with vLLM, memory-efficient
learning via DeepSpeed ZeRO, and dynamic Oracle
services powered by Mosec. Oat supports cutting-edge
online alignment algorithms such as SEA, APL, and XPO, alongside mainstream offline algorithms like
DPO and SimPO. I was proud to contribute to Oat as part of its development team led by Kevin.
|
|