Posts

All the articles I've posted.

Attention Residuals: Softmax Attention Over Depth

1 Jun, 2026

A deep dive into the Kimi team's Attention Residuals (AttnRes) — replacing the fixed-weight residual connection with learned softmax attention over depth. Covers the time–depth duality, Full vs Block AttnRes, the structured-matrix view that unifies prior residual variants, the pipeline-parallel infra that makes it practical, and the scaling-law and 48B-MoE results.
GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs

28 May, 2026

An end-to-end walkthrough of Group Relative Policy Optimization (GRPO) and Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) — the two RL algorithms that drive open reasoning models in 2025–2026. Full math, every design choice motivated, and a head-to-head comparison.
From GRPO to GSPO: Group-Based Policy Optimization for LLMs

28 May, 2026

A complete walkthrough of Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) — the policy-gradient algorithms behind DeepSeek-R1 and Qwen3. Full math, the failure mode that motivated GSPO, the MoE story, and a side-by-side comparison.
GRPO and Dr.GRPO: The Math, the Biases, and the Fix

28 May, 2026

An end-to-end derivation of Group Relative Policy Optimization (GRPO) from DeepSeekMath and the Dr.GRPO correction from Liu et al. Covers the full objective, the gradient, the two biases (length and question difficulty), the unbiased fix, and the practical recipe behind R1-Zero–style training.

Posts

Attention Residuals: Softmax Attention Over Depth

GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs

From GRPO to GSPO: Group-Based Policy Optimization for LLMs

GRPO and Dr.GRPO: The Math, the Biases, and the Fix