Posts
All the articles I've posted.
-
Top Interview 150 — Solutions in Python
Worked Python solutions to the LeetCode Top Interview 150, organized by topic with a short approach for each problem.
-
From AlexNet to World Models: The Evolution of Multi-Modal Neural Networks
A ground-up tour of how neural networks learned to see, then to see-and-read, and finally to imagine. From AlexNet and CNNs, through CLIP and the vision-language models behind GPT-4V, to world models like Dreamer, V-JEPA 2, and LeWorldModel — with architectures, math, and benchmark numbers along the way.
-
Attention Residuals: Softmax Attention Over Depth
A deep dive into the Kimi team's Attention Residuals (AttnRes) — replacing the fixed-weight residual connection with learned softmax attention over depth. Covers the time–depth duality, Full vs Block AttnRes, the structured-matrix view that unifies prior residual variants, the pipeline-parallel infra that makes it practical, and the scaling-law and 48B-MoE results.
-
GRPO and DAPO: A Deep Dive into RL for Reasoning LLMs
An end-to-end walkthrough of Group Relative Policy Optimization (GRPO) and Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO) — the two RL algorithms that drive open reasoning models in 2025–2026. Full math, every design choice motivated, and a head-to-head comparison.