Tag: llm

All the articles with the tag "llm".

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding

28 Jun, 2026

A deep dive into DSpark, DeepSeek's new draft model for speculative decoding. We cover what it actually is — a semi-autoregressive drafter paired with a confidence-scheduled, load-aware verifier — how it differs from vanilla speculative decoding, Medusa, EAGLE-3 and parallel drafters like DFlash, and why it delivers 60–85% faster per-user generation inside the DeepSeek-V4 serving stack.
From GRPO to GSPO: Group-Based Policy Optimization for LLMs

28 May, 2026

A complete walkthrough of Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) — the policy-gradient algorithms behind DeepSeek-R1 and Qwen3. Full math, the failure mode that motivated GSPO, the MoE story, and a side-by-side comparison.
GRPO and Dr.GRPO: The Math, the Biases, and the Fix

28 May, 2026

An end-to-end derivation of Group Relative Policy Optimization (GRPO) from DeepSeekMath and the Dr.GRPO correction from Liu et al. Covers the full objective, the gradient, the two biases (length and question difficulty), the unbiased fix, and the practical recipe behind R1-Zero–style training.

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding