Tag: paper-review
All the articles with the tag "paper-review".
-
From AlexNet to World Models: The Evolution of Multi-Modal Neural Networks
A ground-up tour of how neural networks learned to see, then to see-and-read, and finally to imagine. From AlexNet and CNNs, through CLIP and the vision-language models behind GPT-4V, to world models like Dreamer, V-JEPA 2, and LeWorldModel — with architectures, math, and benchmark numbers along the way.
-
Attention Residuals: Softmax Attention Over Depth
A deep dive into the Kimi team's Attention Residuals (AttnRes) — replacing the fixed-weight residual connection with learned softmax attention over depth. Covers the time–depth duality, Full vs Block AttnRes, the structured-matrix view that unifies prior residual variants, the pipeline-parallel infra that makes it practical, and the scaling-law and 48B-MoE results.
-
GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs
An end-to-end walkthrough of Group Relative Policy Optimization (GRPO) and Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) — the two RL algorithms that drive open reasoning models in 2025–2026. Full math, every design choice motivated, and a head-to-head comparison.
-
From GRPO to GSPO: Group-Based Policy Optimization for LLMs
A complete walkthrough of Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) — the policy-gradient algorithms behind DeepSeek-R1 and Qwen3. Full math, the failure mode that motivated GSPO, the MoE story, and a side-by-side comparison.