CPPO paper released on arXiv
“Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning” released on arXiv. Our proposed CPPO achieves 54.79% Avg@16 on AIME24/25/26 with Qwen3-30B-A3B-Base, significantly outperforming GRPO-like baselines. Paper link · Project page.