Positional Encoding

Absolute Positional Encoding

Relative Positional Encoding

ALiBi

RoPE

NoPE

Read for some math?

A self-attention operator is permutation equivariant while an attention operator with learned query is permutation invariant.

credits

Consider an image or feature map , where denotes the spatial dimension and denotes the number of features. Let denote a permutation of elements. A transformation is called a spatial permutation if , where denotes the permutation matrix associated with , defined as with being a one-hot vector of length and -th element as 1.

definition

An operator is spatial permutation equivariant if for any and any spatial permutation . In addition, an operator is spatially invariant if for any and any spatial permutation .

Algorithm

A self-attention operator is permutation equivariant while an attention operator with learned query is permutation invariant. In particular, denote by the input matrix and by any spatial permutation, we have

and

When applying a spatial permutation to the input of a self-attetnion operator , we have

Note that since is an orthogonal matrix. It is also easy to verify that

for any matrix . Hence is spatial permutation equivariant. Similarly, when applying to the input of an attention operator with a learned query , which is independent of the input , we have

Hence is spatial permutation invariant.

readme

位置编码一篇质量高的综述

📚 Jiaqi's Knowledge Repository

Explorer

Positional Encoding

Absolute Positional Encoding

Relative Positional Encoding

ALiBi

RoPE

NoPE

readme

Graph View

Table of Contents