Tag: deep-learning

All the articles with the tag "deep-learning".

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding

28 Jun, 2026

A deep dive into DSpark, DeepSeek's new draft model for speculative decoding. We cover what it actually is — a semi-autoregressive drafter paired with a confidence-scheduled, load-aware verifier — how it differs from vanilla speculative decoding, Medusa, EAGLE-3 and parallel drafters like DFlash, and why it delivers 60–85% faster per-user generation inside the DeepSeek-V4 serving stack.
Inside GLM-5.2: IndexShare, KVShare, and the End-to-End TV Loss

21 Jun, 2026

A deep dive into GLM-5.2 — a 753B open-weight MoE that serves a 1M-token context. We walk the three innovations that make it cheap to run: IndexShare (cross-layer sparse-attention index reuse), KVShare + rejection sampling for speculative decoding, and a novel end-to-end TV loss that breaks the entropy bound on MTP acceptance. Plus the slime RL stack behind its long-horizon agentic skills.
HRM-Text & MagicNorm: Pretraining a 1B Language Model for ~$1,500

9 Jun, 2026

A walkthrough of HRM-Text: Efficient Pretraining Beyond Scaling — the biologically-inspired Hierarchical Recurrent Model that swaps the Transformer for a dual-timescale recurrent core, and MagicNorm, the normalization trick that makes that deep recurrence trainable by exploiting the forward/backward asymmetry of truncated backpropagation through time.
From AlexNet to World Models: The Evolution of Multi-Modal Neural Networks

2 Jun, 2026

A ground-up tour of how neural networks learned to see, then to see-and-read, and finally to imagine. From AlexNet and CNNs, through CLIP and the vision-language models behind GPT-4V, to world models like Dreamer, V-JEPA 2, and LeWorldModel — with architectures, math, and benchmark numbers along the way.

Tag: deep-learning

Inside DSpark: DeepSeek's Confidence-Scheduled Speculative Decoding

Inside GLM-5.2: IndexShare, KVShare, and the End-to-End TV Loss

HRM-Text & MagicNorm: Pretraining a 1B Language Model for ~$1,500

From AlexNet to World Models: The Evolution of Multi-Modal Neural Networks