Speculative Decoding
Algorithm of Speculative Decoding Step
Algorithm SpeculativeDecodingStep
Inputs: .
Sample guesses from autoregressively.
for i = 1 to do
β
β
end for
Run in parallel.
Determine the number of accepted guesses .
Adjust the distribution from if needed.
if then β
end if
Return one token from , and tokens from .
return
Correctness of Speculative Sampling
We will now show that for any distributions and , the tokens sampled via speculative sampling from and
are distributed identically to those sampled from alone. Let be the acceptance probability (Definition).
Note that as
the normalizing constant for the adjusted distribution is , where the last equation follows immediately from Lemma 3.3 and Theorem 3.5.
Now:
Where:
And:
Overall:
As desired.
Definition
The acceptance rate , given a prefix , is the probability of accepting by speculative sampling.
Lemma
Define
where . Then
Proof.