Tag: reinforcement-learning
All the articles with the tag "reinforcement-learning".
-
Training Composer 2: How Cursor Builds a Coding Agent Model
A structured walkthrough of Sasha Rush's Training Composer 2 workshop: why Cursor chose Kimi K2.5, how continued pretraining and long-horizon RL fit together, what CursorBench measures, and where Composer is headed.
-
Kimi K2.5: Joint Text–Vision Training and the Agent Swarm
A walkthrough of two ideas behind Kimi K2.5: how joint text–vision pre-training and RL make each modality help the other, and how Agent Swarm replaces sequential tool use with a learned parallel orchestrator.