Tag: language-models
All the articles with the tag "language-models".
-
HRM-Text & MagicNorm: Pretraining a 1B Language Model for ~$1,500
A walkthrough of HRM-Text: Efficient Pretraining Beyond Scaling — the biologically-inspired Hierarchical Recurrent Model that swaps the Transformer for a dual-timescale recurrent core, and MagicNorm, the normalization trick that makes that deep recurrence trainable by exploiting the forward/backward asymmetry of truncated backpropagation through time.