Tag: inference

All the articles with the tag "inference".

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding

28 Jun, 2026

A deep dive into DSpark, DeepSeek's new draft model for speculative decoding. We cover what it actually is — a semi-autoregressive drafter paired with a confidence-scheduled, load-aware verifier — how it differs from vanilla speculative decoding, Medusa, EAGLE-3 and parallel drafters like DFlash, and why it delivers 60–85% faster per-user generation inside the DeepSeek-V4 serving stack.

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding