Tag: grpo

All the articles with the tag "grpo".

GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs

28 May, 2026

An end-to-end walkthrough of Group Relative Policy Optimization (GRPO) and Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) — the two RL algorithms that drive open reasoning models in 2025–2026. Full math, every design choice motivated, and a head-to-head comparison.
From GRPO to GSPO: Group-Based Policy Optimization for LLMs

28 May, 2026

A complete walkthrough of Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) — the policy-gradient algorithms behind DeepSeek-R1 and Qwen3. Full math, the failure mode that motivated GSPO, the MoE story, and a side-by-side comparison.

GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs