Absolute Positional Encoding
Relative Positional Encoding
ALiBi
RoPE
NoPE
Read for some math?
A self-attention operator is permutation equivariant while an attention operator with learned query is permutation invariant.
See more
Consider an image or feature map , where denotes the spatial dimension and denotes the number of features. Let denote a permutation of elements. A transformation is called a spatial permutation if , where denotes the permutation matrix associated with , defined as with being a one-hot vector of length and -th element as 1.
definition
An operator is spatial permutation equivariant if for any and any spatial permutation . In addition, an operator is spatially invariant if for any and any spatial permutation .
Algorithm
A self-attention operator is permutation equivariant while an attention operator with learned query is permutation invariant. In particular, denote by the input matrix and by any spatial permutation, we have
and
When applying a spatial permutation to the input of a self-attetnion operator , we have
Note that since is an orthogonal matrix. It is also easy to verify that
for any matrix . Hence is spatial permutation equivariant. Similarly, when applying to the input of an attention operator with a learned query , which is independent of the input , we have
Hence is spatial permutation invariant.