Posts
All the articles I've posted.
-
Inside GLM-5.2: IndexShare, KVShare, and the End-to-End TV Loss
A deep dive into GLM-5.2 — a 753B open-weight MoE that serves a 1M-token context. We walk the three innovations that make it cheap to run: IndexShare (cross-layer sparse-attention index reuse), KVShare + rejection sampling for speculative decoding, and a novel end-to-end TV loss that breaks the entropy bound on MTP acceptance. Plus the slime RL stack behind its long-horizon agentic skills.
-
Mastra vs Agno: two agent frameworks with very different centers of gravity
A practical comparison of Mastra and Agno, covering developer experience, agents, teams, workflows, memory, RAG, observability, deployment, and when to choose each framework.
-
Difference Between On-Policy Distillation and Reinforcement Learning
A in-depth analysis on comparing on-policy distillation with reinforcement learning.
-
HRM-Text & MagicNorm: Pretraining a 1B Language Model for ~$1,500
A walkthrough of HRM-Text: Efficient Pretraining Beyond Scaling — the biologically-inspired Hierarchical Recurrent Model that swaps the Transformer for a dual-timescale recurrent core, and MagicNorm, the normalization trick that makes that deep recurrence trainable by exploiting the forward/backward asymmetry of truncated backpropagation through time.