Tag: qwen
All the articles with the tag "qwen".
-
From GRPO to GSPO: Group-Based Policy Optimization for LLMs
A complete walkthrough of Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) — the policy-gradient algorithms behind DeepSeek-R1 and Qwen3. Full math, the failure mode that motivated GSPO, the MoE story, and a side-by-side comparison.