Tag: llm-training
All the articles with the tag "llm-training".
-
GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs
An end-to-end walkthrough of Group Relative Policy Optimization (GRPO) and Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) — the two RL algorithms that drive open reasoning models in 2025–2026. Full math, every design choice motivated, and a head-to-head comparison.