Posts

All the articles I've posted.

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding

28 Jun, 2026

A deep dive into DSpark, DeepSeek's new draft model for speculative decoding. We cover what it actually is — a semi-autoregressive drafter paired with a confidence-scheduled, load-aware verifier — how it differs from vanilla speculative decoding, Medusa, EAGLE-3 and parallel drafters like DFlash, and why it delivers 60–85% faster per-user generation inside the DeepSeek-V4 serving stack.
Inside GLM-5.2: IndexShare, KVShare, and the End-to-End TV Loss

21 Jun, 2026

A deep dive into GLM-5.2 — a 753B open-weight MoE that serves a 1M-token context. We walk the three innovations that make it cheap to run: IndexShare (cross-layer sparse-attention index reuse), KVShare + rejection sampling for speculative decoding, and a novel end-to-end TV loss that breaks the entropy bound on MTP acceptance. Plus the slime RL stack behind its long-horizon agentic skills.
Mastra vs Agno: two agent frameworks with very different centers of gravity

17 Jun, 2026

A practical comparison of Mastra and Agno, covering developer experience, agents, teams, workflows, memory, RAG, observability, deployment, and when to choose each framework.
Difference Between On-Policy Distillation and Reinforcement Learning

17 Jun, 2026

A in-depth analysis on comparing on-policy distillation with reinforcement learning.

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding