Tag: paper-review
All the articles with the tag "paper-review".
-
GRPO and Dr.GRPO: The Math, the Biases, and the Fix
An end-to-end derivation of Group Relative Policy Optimization (GRPO) from DeepSeekMath and the Dr.GRPO correction from Liu et al. Covers the full objective, the gradient, the two biases (length and question difficulty), the unbiased fix, and the practical recipe behind R1-Zero–style training.
-
Hybrid Attention and MLA: The Tradeoff
A side-by-side dive into Xiaomi MiMo's hybrid sliding-window/global attention and DeepSeek's Multi-head Latent Attention. The two answer the same question — how to make attention affordable at long context — with very different bets, and those bets shape everything from training infra to KV cache size.
-
Kimi K2.5: Joint Text–Vision Training and the Agent Swarm
A walkthrough of two ideas behind Kimi K2.5: how joint text–vision pre-training and RL make each modality help the other, and how Agent Swarm replaces sequential tool use with a learned parallel orchestrator.
-
Inside DeepSeek's Sparse Attention: From NSA to DSA
A deep dive into DeepSeek's two sparse attention designs — Native Sparse Attention (NSA) and DeepSeek Sparse Attention (DSA) — covering the math, the hardware story, and why DSA in V3.2 looks so different from NSA.