Tag: kimi

All the articles with the tag "kimi".

Attention Residuals: Softmax Attention Over Depth

1 Jun, 2026

A deep dive into the Kimi team's Attention Residuals (AttnRes) — replacing the fixed-weight residual connection with learned softmax attention over depth. Covers the time–depth duality, Full vs Block AttnRes, the structured-matrix view that unifies prior residual variants, the pipeline-parallel infra that makes it practical, and the scaling-law and 48B-MoE results.
Kimi K2.5: Joint Text–Vision Training and the Agent Swarm

19 May, 2026

A walkthrough of two ideas behind Kimi K2.5: how joint text–vision pre-training and RL make each modality help the other, and how Agent Swarm replaces sequential tool use with a learned parallel orchestrator.

Attention Residuals: Softmax Attention Over Depth