Tag: paper-review

All the articles with the tag "paper-review".

Hybrid Attention and MLA: The Tradeoff

23 May, 2026

A side-by-side dive into Xiaomi MiMo's hybrid sliding-window/global attention and DeepSeek's Multi-head Latent Attention. The two answer the same question — how to make attention affordable at long context — with very different bets, and those bets shape everything from training infra to KV cache size.
Kimi K2.5: Joint Text–Vision Training and the Agent Swarm

19 May, 2026

A walkthrough of two ideas behind Kimi K2.5: how joint text–vision pre-training and RL make each modality help the other, and how Agent Swarm replaces sequential tool use with a learned parallel orchestrator.
Inside DeepSeek's Sparse Attention: From NSA to DSA

18 May, 2026

A deep dive into DeepSeek's two sparse attention designs — Native Sparse Attention (NSA) and DeepSeek Sparse Attention (DSA) — covering the math, the hardware story, and why DSA in V3.2 looks so different from NSA.

Hybrid Attention and MLA: The Tradeoff