Speculative Decoding

paper

Algorithm of Speculative Decoding Step


Algorithm SpeculativeDecodingStep

Inputs: .

Sample guesses from autoregressively.

for i = 1 to do
 
 
end for

Run in parallel.

Determine the number of accepted guesses .

Adjust the distribution from if needed.

if then  

end if

Return one token from , and tokens from .

return


Correctness of Speculative Sampling

We will now show that for any distributions and , the tokens sampled via speculative sampling from and

are distributed identically to those sampled from alone. Let be the acceptance probability (Definition).

Note that as

the normalizing constant for the adjusted distribution is , where the last equation follows immediately from Lemma 3.3 and Theorem 3.5.

Now:

Where:

And:

Overall:

As desired.

Definition

The acceptance rate , given a prefix , is the probability of accepting by speculative sampling.

Lemma

Define

where . Then

Proof.