🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
📚TheoryIntermediate

Equivariance & Invariance

Key Points

  • •
    Equivariance means that applying a transformation before a function is the same as applying a corresponding transformation after the function.
  • •
    Invariance is a special case of equivariance where the output does not change under the group’s transformations.
  • •
    Convolution is translation-equivariant; shifting the input shifts the output by the same amount.
  • •
    Pooling operations like global sum or max are permutation-invariant; reordering elements does not change the result.
  • •
    For sets, every linear permutation-equivariant layer has the form yi​=A xi​ + B ∑j​ xj​, which commutes with any reordering.
  • •
    Equivariance encodes symmetry so models generalize better with fewer parameters and less data.
  • •
    Boundary conditions and discretization (e.g., padding choice) can break exact equivariance in practice.
  • •
    Composing equivariant layers with invariant pooling yields invariant models, a common design in CNNs, GNNs, and DeepSets.

Prerequisites

  • →Basics of group theory — Understanding groups and actions is essential to define and reason about equivariance and invariance.
  • →Linear algebra — Matrix representations of group actions, convolution as linear operation, and permutation matrices rely on linear algebra.
  • →Discrete signals and convolution — Translation equivariance is most easily seen through convolution on discrete signals.
  • →Modular arithmetic — Circular shifts and periodic boundary conditions use modulo indexing.
  • →Vector norms and numerical stability — Testing equivariance numerically requires tolerance-based comparisons and awareness of floating-point error.
  • →Graphs and sets representation — Permutation equivariance/invariance is defined over sets and graphs, requiring row-wise data handling.

Detailed Explanation

Tap terms for definitions

01Overview

Equivariance and invariance are principles that capture how functions should behave when inputs are transformed by symmetries—like shifts, rotations, or permutations. Intuitively, a function is equivariant if transforming the input leads to a predictable transformation of the output. For example, if you shift an image to the right and then run a convolution, the output also shifts right by the same amount. Invariance is the special case where the output does not change at all under the transformation; for instance, a classifier’s label for an object should not depend on where the object appears in an image. These ideas arise from group theory: a group represents a set of transformations that can be combined (like doing two shifts in a row) and have an identity (doing nothing). Designing algorithms and models that respect these symmetries leads to better generalization and efficiency, because we build the symmetry knowledge directly into the computation instead of relying on data to teach it. In practice, this shows up in convolutional neural networks (translation equivariance), graph neural networks (permutation equivariance), and physics-inspired models (rotation/SE(3) equivariance).

02Intuition & Analogies

Imagine you’re reading text on a sliding whiteboard. If someone slides the board to the right, the sentence is still the same—just shifted. A tool like a highlighter that marks nouns should mark the same words no matter where the board sits. If the highlighting also shifts with the text, the tool is equivariant to shifts. If the tool simply counts how many nouns appear and returns a number that doesn’t move with the board, it’s invariant to shifts. Another analogy: think of a bag of marbles. The order in which you pull marbles out is irrelevant to the bag’s total weight. Any computation that just depends on the collection (not the order) is permutation-invariant. If you label each marble with a score and then reorder the marbles, the list of scores should reorder the same way—this is permutation equivariance. In image processing, convolution is like sliding a stencil over an image to detect patterns. If you slide the image, the detected pattern map slides the same way—so convolution is translation-equivariant. Finally, consider renaming nodes in a social network: a meaningful analysis should not depend on the node IDs. A graph algorithm that simply relabels its outputs when you relabel inputs is permutation-equivariant; if it produces a single graph-level summary that is unaffected by relabeling, it’s permutation-invariant. These intuitions guide us to design computations that respect the problem’s inherent symmetries.

03Formal Definition

Let G be a group acting on spaces X and Y via transformations Tg​: X → X and T'_g: Y → Y that satisfy Te​ = Id and Tgh​=Tg​ ∘ Th​ for all g,h ∈ G. A function f: X → Y is G-equivariant if for all g ∈ G and x ∈ X we have f(Tg​(x)) = T'_g(f(x)). If T'_g is the identity on Y for all g (i.e., T'_g = Id), then f is G-invariant: f(Tg​(x)) = f(x). When X and Y are vector spaces and the group acts linearly (via representations ρX​(g), ρY​(g)), the equivariance condition becomes f(ρX​(g) x) = ρY​(g) f(x). For discrete sets with n elements, the symmetric group Sn​ acts by permutation matrices P; permutation equivariance means f(Px) = P f(x). For translation on discrete signals, the shift operator S_τ acts by (S_τ x)[t] = x[t-τ]; a linear time-invariant convolution (x * k)[t] = ∑j​ x[t-j] k[j] satisfies (S_τ x) * k = S_τ (x * k), i.e., translation equivariance. These definitions can be visualized as a commuting diagram: applying Tg​ and then f equals applying f and then T'_g.

04When to Use

Use equivariance when the task’s outputs should transform in lockstep with the inputs under known symmetries. Examples: feature maps in CNNs (translation symmetry), node features in GNNs under node permutations, vector fields under rotations in physics simulations, and molecular force prediction under SE(3) (rotation/translation) symmetry. Use invariance when the final output should not depend on the transformation: image classification that ignores object location (translation invariance via global pooling), graph classification that ignores node labels (permutation invariance via pooling over nodes), or set regression where order does not matter. In practice, you often stack equivariant layers (to propagate local structure consistently across the domain) and then apply an invariant readout (like sum/mean/max pooling) to produce a stable global summary. Choose the group carefully: circular padding enables exact shift equivariance on periodic signals; reflection or rotation groups can be used for crystals or microscopy; permutation groups are natural for sets and graphs. If your data has approximate symmetries (e.g., small perspective distortions), equivariance can still help, but you may need data augmentation or relaxed architectures.

⚠️Common Mistakes

  1. Confusing equivariance with invariance: equivariant outputs move with inputs; invariant outputs do not change. Be explicit about T_g and T'_g. 2) Breaking equivariance with boundary handling: zero padding in convolution is not strictly translation-equivariant; circular padding is. 3) Ignoring representation choices: if the group action on outputs is wrong (e.g., using identity when outputs should rotate), the equivariance claim fails. 4) Assuming any nonlinearity preserves equivariance: pointwise nonlinearities do preserve many equivariances (same action on each coordinate), but operations that couple coordinates in an action-dependent way may break it. 5) Mishandling permutations: forgetting to permute both features and adjacency in graphs ruins permutation equivariance. 6) Over-constraining models: forcing invariance too early (e.g., pooling at the input) can discard useful information needed to solve the task. 7) Floating-point expectations: numerical tests of equivariance should allow small tolerances due to rounding. 8) Using the wrong group: approximate or missing symmetries can mislead model design; validate that the symmetry holds for your data and objective.

Key Formulas

Equivariance Definition

f(Tg​(x))=Tg′​(f(x))

Explanation: Applying the group action to the input and then f equals applying f and then the corresponding action on the output. This captures how outputs should transform predictably with inputs.

Invariance Definition

f(Tg​(x))=f(x)

Explanation: The function f is unchanged under the group action on inputs. This is the special case of equivariance where the output action is the identity.

Group Action Axioms

Te​=Id,Tgh​=Tg​∘Th​

Explanation: These properties ensure the transformations form a consistent action of the group on a space. The identity does nothing, and composition matches group multiplication.

Discrete Convolution

(x∗k)[t]=j=−∞∑∞​x[t−j]k[j]

Explanation: Convolution at time t sums products of input x with a reversed, shifted kernel k. On finite signals we restrict the sum and handle boundaries by padding.

Translation Equivariance of Convolution

Sτ​(x)[t]=x[t−τ],(Sτ​x)∗k=Sτ​(x∗k)

Explanation: Shifting the input by τ and then convolving equals convolving first and then shifting by τ. This explains why CNN feature maps move with the input.

Permutation Equivariance

f(PX)=Pf(X),P∈Sn​

Explanation: For a set or graph with n elements, reordering the elements by permutation P reorders the outputs in the same way. This is essential for set-/graph-structured data.

General Linear Permutation-Equivariant Map

yi​=Axi​+Bj=1∑n​xj​

Explanation: Every linear, permutation-equivariant map on element features has this form. The shared A acts locally and B couples to the global sum, ensuring commutation with any permutation.

DeepSets Invariant Form

f(X)=ρ(i=1∑n​ϕ(xi​))

Explanation: A universal form for permutation-invariant functions on sets. First map each element by ϕ, sum them, and then apply ρ to obtain an order-agnostic result.

Convolution Theorem

x∗k(ω)=x^(ω)k^(ω)

Explanation: In the frequency domain, convolution becomes pointwise multiplication. This diagonalization explains why convolution commutes with translations.

Invariant Readout from Equivariant Features

q∘f is invariant if f is equivariant and q is invariant

Explanation: Composing an equivariant feature extractor with an invariant pooling yields an invariant overall function. This is the standard CNN/GNN design pattern.

Complexity Analysis

The computational cost of enforcing or testing equivariance depends on the operation and group action. For 1D discrete convolution of a length-n signal with a length-k kernel, the naive implementation runs in O(nk) time and O(n) extra space for the output. Convolution is translation-equivariant provided we handle boundaries consistently; using circular padding gives exact equivariance on a cyclic domain. If many convolutions are required, FFT-based convolution reduces time to O(n log n) but still preserves equivariance under circular shifts. For permutation-equivariant linear layers on sets of n elements with d-dimensional features, computing yi​=A xi​ + B ∑j​ xj​ requires first forming the global sum in O(nd) time, then applying two d×d matrix multiplies per element (A and B), totaling O(nd2) time; the space is O(nd) for storing inputs and outputs plus O(d2) for parameters. Applying a permutation to reorder elements takes O(n) index moves (or O(nd) if copying d-dimensional vectors), and testing equivariance numerically involves two forward passes plus a permutation, yielding O(nd2) total. Invariant pooling via sum/mean/max over n elements costs O(nd) time (a single pass) and O(d) space for the accumulator. These complexities make equivariant layers efficient compared to unconstrained alternatives that may scale with n2 (e.g., dense attention), while embedding strong inductive biases. Practical caveats include memory bandwidth limits when n and d are large, and floating-point discrepancies that can introduce tiny deviations from exact equalities, so comparisons should use tolerances rather than strict equality.

Code Examples

Translation Equivariance of Circular 1D Convolution
1#include <bits/stdc++.h>
2using namespace std;
3
4// Circular index helper: ensures (i mod n) is in [0, n-1]
5static inline int mod_idx(int i, int n) {
6 int r = i % n;
7 return r < 0 ? r + n : r;
8}
9
10// Compute circular 1D convolution y = x * k with length(x) = n, length(k) = m
11// y[t] = sum_j k[j] * x[t - j mod n]
12vector<double> circular_convolve(const vector<double>& x, const vector<double>& k) {
13 int n = (int)x.size();
14 int m = (int)k.size();
15 vector<double> y(n, 0.0);
16 for (int t = 0; t < n; ++t) {
17 double acc = 0.0;
18 for (int j = 0; j < m; ++j) {
19 int idx = mod_idx(t - j, n);
20 acc += k[j] * x[idx];
21 }
22 y[t] = acc;
23 }
24 return y;
25}
26
27// Circularly shift a signal by shift s: (S_s x)[t] = x[t - s mod n]
28vector<double> circular_shift(const vector<double>& x, int s) {
29 int n = (int)x.size();
30 vector<double> y(n);
31 for (int t = 0; t < n; ++t) {
32 y[t] = x[mod_idx(t - s, n)];
33 }
34 return y;
35}
36
37int main() {
38 ios::sync_with_stdio(false);
39 cin.tie(nullptr);
40
41 // Create a random signal and kernel
42 int n = 128; // signal length
43 int m = 9; // kernel length (small for locality)
44 int s = 17; // shift amount
45
46 std::mt19937 rng(42);
47 std::normal_distribution<double> dist(0.0, 1.0);
48
49 vector<double> x(n), k(m);
50 for (int i = 0; i < n; ++i) x[i] = dist(rng);
51 for (int j = 0; j < m; ++j) k[j] = dist(rng);
52
53 // Compute y = x * k
54 vector<double> y = circular_convolve(x, k);
55
56 // Compute shifted signal and its convolution
57 vector<double> x_shift = circular_shift(x, s);
58 vector<double> y_from_shifted = circular_convolve(x_shift, k);
59
60 // Shift the original convolution output
61 vector<double> y_shift = circular_shift(y, s);
62
63 // Compare: translation equivariance means y_from_shifted == y_shift
64 double max_abs_diff = 0.0;
65 for (int i = 0; i < n; ++i) {
66 max_abs_diff = max(max_abs_diff, fabs(y_from_shifted[i] - y_shift[i]));
67 }
68
69 cout << fixed << setprecision(6);
70 cout << "Max |(S_s x)*k - S_s(x*k)| = " << max_abs_diff << "\n";
71 if (max_abs_diff < 1e-9) {
72 cout << "Equivariance verified (up to numerical precision).\n";
73 } else {
74 cout << "Equivariance broken (likely by implementation or precision).\n";
75 }
76
77 return 0;
78}
79

This program implements circular 1D convolution and a circular shift. It demonstrates translation equivariance by showing that convolving a shifted input equals shifting the convolved output. Circular padding ensures exact equivariance on a periodic domain. The maximum absolute difference should be near machine precision.

Time: O(nm) for convolution; O(n) for shifting; total O(nm)Space: O(n) for outputs plus O(n + m) for inputs and kernel
Permutation-Equivariant Linear Layer for Sets
1#include <bits/stdc++.h>
2using namespace std;
3
4// Multiply a dxd matrix M by a d-vector v
5vector<double> matvec(const vector<vector<double>>& M, const vector<double>& v) {
6 int d = (int)v.size();
7 vector<double> out(d, 0.0);
8 for (int i = 0; i < d; ++i) {
9 double acc = 0.0;
10 for (int j = 0; j < d; ++j) acc += M[i][j] * v[j];
11 out[i] = acc;
12 }
13 return out;
14}
15
16// Add two d-vectors
17vector<double> add_vec(const vector<double>& a, const vector<double>& b) {
18 int d = (int)a.size();
19 vector<double> out(d);
20 for (int i = 0; i < d; ++i) out[i] = a[i] + b[i];
21 return out;
22}
23
24// Sum over n elements of d-vectors
25vector<double> sum_rows(const vector<vector<double>>& X) {
26 int n = (int)X.size();
27 int d = (int)X[0].size();
28 vector<double> s(d, 0.0);
29 for (int i = 0; i < n; ++i)
30 for (int j = 0; j < d; ++j)
31 s[j] += X[i][j];
32 return s;
33}
34
35// Permute rows of X by permutation p: Y[i] = X[p[i]]
36vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) {
37 int n = (int)X.size();
38 vector<vector<double>> Y(n, vector<double>(X[0].size()));
39 for (int i = 0; i < n; ++i) Y[i] = X[p[i]];
40 return Y;
41}
42
43// Equivariant layer: y_i = A x_i + B * sum_j x_j
44vector<vector<double>> pe_layer(const vector<vector<double>>& X,
45 const vector<vector<double>>& A,
46 const vector<vector<double>>& B) {
47 int n = (int)X.size();
48 int d = (int)X[0].size();
49 vector<double> S = sum_rows(X); // global sum (shared context)
50 vector<vector<double>> Y(n, vector<double>(d, 0.0));
51 vector<double> BS = matvec(B, S);
52 for (int i = 0; i < n; ++i) {
53 vector<double> Ax = matvec(A, X[i]);
54 for (int j = 0; j < d; ++j) Y[i][j] = Ax[j] + BS[j];
55 }
56 return Y;
57}
58
59int main() {
60 ios::sync_with_stdio(false);
61 cin.tie(nullptr);
62
63 int n = 8; // number of elements in the set
64 int d = 4; // feature dimension per element
65
66 std::mt19937 rng(123);
67 std::normal_distribution<double> dist(0.0, 1.0);
68
69 // Random set X (n x d)
70 vector<vector<double>> X(n, vector<double>(d));
71 for (int i = 0; i < n; ++i)
72 for (int j = 0; j < d; ++j)
73 X[i][j] = dist(rng);
74
75 // Random parameters A, B (d x d)
76 vector<vector<double>> A(d, vector<double>(d)), B(d, vector<double>(d));
77 for (int i = 0; i < d; ++i)
78 for (int j = 0; j < d; ++j) {
79 A[i][j] = dist(rng) * 0.5;
80 B[i][j] = dist(rng) * 0.5;
81 }
82
83 // Random permutation p
84 vector<int> p(n);
85 iota(p.begin(), p.end(), 0);
86 shuffle(p.begin(), p.end(), rng);
87
88 // Forward on original X
89 auto Y1 = pe_layer(X, A, B);
90 // Permute inputs and forward
91 auto Xp = permute(X, p);
92 auto Y2 = pe_layer(Xp, A, B);
93 // Permute Y1 to compare: P Y1
94 auto PY1 = permute(Y1, p);
95
96 // Compute max absolute difference between Y2 and P Y1
97 double max_abs_diff = 0.0;
98 for (int i = 0; i < n; ++i)
99 for (int j = 0; j < d; ++j)
100 max_abs_diff = max(max_abs_diff, fabs(Y2[i][j] - PY1[i][j]));
101
102 cout << fixed << setprecision(6);
103 cout << "Max |f(PX) - P f(X)| = " << max_abs_diff << "\n";
104 if (max_abs_diff < 1e-9) cout << "Permutation equivariance verified.\n";
105 else cout << "Equivariance broken (check implementation).\n";
106
107 return 0;
108}
109

This code builds a linear permutation-equivariant layer for sets: y_i = A x_i + B sum_j x_j. It verifies f(PX) = P f(X) for a random permutation P by comparing outputs. The global sum couples elements identically, ensuring commutation with any reordering.

Time: O(nd^2) for the layer; O(nd) for forming the global sum; O(n) to permute indices (O(nd) if copying feature rows)Space: O(nd) to store inputs/outputs plus O(d^2) for parameters
Permutation-Invariant Readout via Sum/Mean/Max Pooling
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Aggregates {
5 vector<double> sum, mean, mx;
6};
7
8Aggregates aggregate(const vector<vector<double>>& X) {
9 int n = (int)X.size();
10 int d = (int)X[0].size();
11 Aggregates aggr;
12 aggr.sum.assign(d, 0.0);
13 aggr.mx.assign(d, -numeric_limits<double>::infinity());
14
15 for (int i = 0; i < n; ++i) {
16 for (int j = 0; j < d; ++j) {
17 aggr.sum[j] += X[i][j];
18 aggr.mx[j] = max(aggr.mx[j], X[i][j]);
19 }
20 }
21 aggr.mean = aggr.sum;
22 for (int j = 0; j < d; ++j) aggr.mean[j] /= max(1, (int)X.size());
23 return aggr;
24}
25
26vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) {
27 int n = (int)X.size();
28 vector<vector<double>> Y(n, vector<double>(X[0].size()));
29 for (int i = 0; i < n; ++i) Y[i] = X[p[i]];
30 return Y;
31}
32
33int main() {
34 ios::sync_with_stdio(false);
35 cin.tie(nullptr);
36
37 int n = 10, d = 3;
38 std::mt19937 rng(7);
39 std::uniform_real_distribution<double> dist(-1.0, 1.0);
40
41 vector<vector<double>> X(n, vector<double>(d));
42 for (int i = 0; i < n; ++i)
43 for (int j = 0; j < d; ++j)
44 X[i][j] = dist(rng);
45
46 // Random permutation
47 vector<int> p(n); iota(p.begin(), p.end(), 0); shuffle(p.begin(), p.end(), rng);
48
49 auto A1 = aggregate(X);
50 auto A2 = aggregate(permute(X, p));
51
52 auto cmp = [](const vector<double>& a, const vector<double>& b){
53 double m = 0.0;
54 for (size_t i = 0; i < a.size(); ++i) m = max(m, fabs(a[i] - b[i]));
55 return m;
56 };
57
58 cout << fixed << setprecision(6);
59 cout << "Max |sum - sum(PX)| = " << cmp(A1.sum, A2.sum) << "\n";
60 cout << "Max |mean - mean(PX)| = " << cmp(A1.mean, A2.mean) << "\n";
61 cout << "Max |max - max(PX)| = " << cmp(A1.mx, A2.mx) << "\n";
62
63 return 0;
64}
65

This example computes sum, mean, and max over element features and shows these aggregations are invariant to permutations. Such invariant readouts are used in DeepSets and GNNs to produce order-independent graph- or set-level summaries.

Time: O(nd) to aggregate; O(n) to permute indices (O(nd) if copying feature rows)Space: O(d) additional space for accumulators
#equivariance#invariance#group action#permutation equivariant#translation equivariant#convolution#pooling#deepsets#symmetry#representation theory#permutation matrix#commutative diagram#cnn#gnn#circular shift