📚TheoryIntermediate

Equivariance & Invariance

Key Points

•
Equivariance means that applying a transformation before a function is the same as applying a corresponding transformation after the function.
•
Invariance is a special case of equivariance where the output does not change under the group’s transformations.
•
Convolution is translation-equivariant; shifting the input shifts the output by the same amount.
•
Pooling operations like global sum or max are permutation-invariant; reordering elements does not change the result.
•
For sets, every linear permutation-equivariant layer has the form $y_{i} = A$ $x_{i}$ + B $\sum_{j}$ $x_{j}$ , which commutes with any reordering.
•
Equivariance encodes symmetry so models generalize better with fewer parameters and less data.
•
Boundary conditions and discretization (e.g., padding choice) can break exact equivariance in practice.
•
Composing equivariant layers with invariant pooling yields invariant models, a common design in CNNs, GNNs, and DeepSets.

Prerequisites

→Basics of group theory — Understanding groups and actions is essential to define and reason about equivariance and invariance.
→Linear algebra — Matrix representations of group actions, convolution as linear operation, and permutation matrices rely on linear algebra.
→Discrete signals and convolution — Translation equivariance is most easily seen through convolution on discrete signals.
→Modular arithmetic — Circular shifts and periodic boundary conditions use modulo indexing.
→Vector norms and numerical stability — Testing equivariance numerically requires tolerance-based comparisons and awareness of floating-point error.
→Graphs and sets representation — Permutation equivariance/invariance is defined over sets and graphs, requiring row-wise data handling.

Detailed Explanation

Tap terms for definitions

01Overview

Equivariance and invariance are principles that capture how functions should behave when inputs are transformed by symmetries—like shifts, rotations, or permutations. Intuitively, a function is equivariant if transforming the input leads to a predictable transformation of the output. For example, if you shift an image to the right and then run a convolution, the output also shifts right by the same amount. Invariance is the special case where the output does not change at all under the transformation; for instance, a classifier’s label for an object should not depend on where the object appears in an image. These ideas arise from group theory: a group represents a set of transformations that can be combined (like doing two shifts in a row) and have an identity (doing nothing). Designing algorithms and models that respect these symmetries leads to better generalization and efficiency, because we build the symmetry knowledge directly into the computation instead of relying on data to teach it. In practice, this shows up in convolutional neural networks (translation equivariance), graph neural networks (permutation equivariance), and physics-inspired models (rotation/SE(3) equivariance).

02Intuition & Analogies

Imagine you’re reading text on a sliding whiteboard. If someone slides the board to the right, the sentence is still the same—just shifted. A tool like a highlighter that marks nouns should mark the same words no matter where the board sits. If the highlighting also shifts with the text, the tool is equivariant to shifts. If the tool simply counts how many nouns appear and returns a number that doesn’t move with the board, it’s invariant to shifts. Another analogy: think of a bag of marbles. The order in which you pull marbles out is irrelevant to the bag’s total weight. Any computation that just depends on the collection (not the order) is permutation-invariant. If you label each marble with a score and then reorder the marbles, the list of scores should reorder the same way—this is permutation equivariance. In image processing, convolution is like sliding a stencil over an image to detect patterns. If you slide the image, the detected pattern map slides the same way—so convolution is translation-equivariant. Finally, consider renaming nodes in a social network: a meaningful analysis should not depend on the node IDs. A graph algorithm that simply relabels its outputs when you relabel inputs is permutation-equivariant; if it produces a single graph-level summary that is unaffected by relabeling, it’s permutation-invariant. These intuitions guide us to design computations that respect the problem’s inherent symmetries.

03Formal Definition

Let G be a group acting on spaces X and Y via transformations

T_{g}

: X

\to

X and T'_g: Y

\to

Y that satisfy

T_{e}

Id

and

T_{g h} = T_{g}

\circ

T_{h}

for all g,h

\in

G. A function f: X

\to

Y is G-equivariant if for all g

\in

G and x

\in

X we have f(

T_{g}

(x)) = T'_g(f(x)). If T'_g is the identity on Y for all g (i.e., T'_g =

Id

), then f is G-invariant: f(

T_{g}

(x)) = f(x). When X and Y are vector spaces and the group acts linearly (via representations

ρ_{X}

(g),

ρ_{Y}

(g)), the equivariance condition becomes f(

ρ_{X}

(g) x) =

ρ_{Y}

(g) f(x). For discrete sets with n elements, the symmetric group

S_{n}

acts by permutation matrices P; permutation equivariance means f(Px) = P f(x). For translation on discrete signals, the shift operator S_

τ

acts by (S_

τ

x)[t] = x[t-

τ

]; a linear time-invariant convolution (x * k)[t] =

\sum_{j}

x[t-j] k[j] satisfies (S_

τ

x) * k = S_

τ

(x * k), i.e., translation equivariance. These definitions can be visualized as a commuting diagram: applying

T_{g}

and then f equals applying f and then T'_g.

04When to Use

Use equivariance when the task’s outputs should transform in lockstep with the inputs under known symmetries. Examples: feature maps in CNNs (translation symmetry), node features in GNNs under node permutations, vector fields under rotations in physics simulations, and molecular force prediction under SE(3) (rotation/translation) symmetry. Use invariance when the final output should not depend on the transformation: image classification that ignores object location (translation invariance via global pooling), graph classification that ignores node labels (permutation invariance via pooling over nodes), or set regression where order does not matter. In practice, you often stack equivariant layers (to propagate local structure consistently across the domain) and then apply an invariant readout (like sum/mean/max pooling) to produce a stable global summary. Choose the group carefully: circular padding enables exact shift equivariance on periodic signals; reflection or rotation groups can be used for crystals or microscopy; permutation groups are natural for sets and graphs. If your data has approximate symmetries (e.g., small perspective distortions), equivariance can still help, but you may need data augmentation or relaxed architectures.

⚠️Common Mistakes

Confusing equivariance with invariance: equivariant outputs move with inputs; invariant outputs do not change. Be explicit about $T_g$ and T'_g. 2) Breaking equivariance with boundary handling: zero padding in convolution is not strictly translation-equivariant; circular padding is. 3) Ignoring representation choices: if the group action on outputs is wrong (e.g., using identity when outputs should rotate), the equivariance claim fails. 4) Assuming any nonlinearity preserves equivariance: pointwise nonlinearities do preserve many equivariances (same action on each coordinate), but operations that couple coordinates in an action-dependent way may break it. 5) Mishandling permutations: forgetting to permute both features and adjacency in graphs ruins permutation equivariance. 6) Over-constraining models: forcing invariance too early (e.g., pooling at the input) can discard useful information needed to solve the task. 7) Floating-point expectations: numerical tests of equivariance should allow small tolerances due to rounding. 8) Using the wrong group: approximate or missing symmetries can mislead model design; validate that the symmetry holds for your data and objective.

Key Formulas

Equivariance Definition

f (T_{g} (x)) = T_{g}^{'} (f (x))

Explanation: Applying the group action to the input and then f equals applying f and then the corresponding action on the output. This captures how outputs should transform predictably with inputs.

Invariance Definition

f (T_{g} (x)) = f (x)

Explanation: The function f is unchanged under the group action on inputs. This is the special case of equivariance where the output action is the identity.

Group Action Axioms

T_{e} = Id, T_{g h} = T_{g} \circ T_{h}

Explanation: These properties ensure the transformations form a consistent action of the group on a space. The identity does nothing, and composition matches group multiplication.

Discrete Convolution

(x * k) [t] = j = - \infty \sum \infty x [t - j] k [j]

Explanation: Convolution at time t sums products of input x with a reversed, shifted kernel k. On finite signals we restrict the sum and handle boundaries by padding.

Translation Equivariance of Convolution

S_{τ} (x) [t] = x [t - τ], (S_{τ} x) * k = S_{τ} (x * k)

Explanation: Shifting the input by $τ$ and then convolving equals convolving first and then shifting by $τ$ . This explains why CNN feature maps move with the input.

Permutation Equivariance

f (PX) = P f (X), P \in S_{n}

Explanation: For a set or graph with n elements, reordering the elements by permutation P reorders the outputs in the same way. This is essential for set-/graph-structured data.

General Linear Permutation-Equivariant Map

y_{i} = A x_{i} + B j = 1 \sum n x_{j}

Explanation: Every linear, permutation-equivariant map on element features has this form. The shared A acts locally and B couples to the global sum, ensuring commutation with any permutation.

DeepSets Invariant Form

f (X) = ρ (i = 1 \sum n ϕ (x_{i}))

Explanation: A universal form for permutation-invariant functions on sets. First map each element by $ϕ$ , sum them, and then apply $ρ$ to obtain an order-agnostic result.

Convolution Theorem

x * k (ω) = \overset{x}{^} (ω) \hat{k} (ω)

Explanation: In the frequency domain, convolution becomes pointwise multiplication. This diagonalization explains why convolution commutes with translations.

Invariant Readout from Equivariant Features

q \circ f is invariant if f is equivariant and q is invariant

Explanation: Composing an equivariant feature extractor with an invariant pooling yields an invariant overall function. This is the standard CNN/GNN design pattern.

Complexity Analysis

The computational cost of enforcing or testing equivariance depends on the operation and group action. For 1D discrete convolution of a length-n signal with a length-k kernel, the naive implementation runs in O(nk) time and O(n) extra space for the output. Convolution is translation-equivariant provided we handle boundaries consistently; using circular padding gives exact equivariance on a cyclic domain. If many convolutions are required, FFT-based convolution reduces time to O(n log n) but still preserves equivariance under circular shifts. For permutation-equivariant linear layers on sets of n elements with d-dimensional features, computing

y_{i} = A

x_{i}

+ B

\sum_{j}

x_{j}

requires first forming the global sum in O(nd) time, then applying two d×d matrix multiplies per element (A and B), totaling O(n

d^{2}

) time; the space is O(nd) for storing inputs and outputs plus O(

d^{2}

) for parameters. Applying a permutation to reorder elements takes O(n) index moves (or O(nd) if copying d-dimensional vectors), and testing equivariance numerically involves two forward passes plus a permutation, yielding O(n

d^{2}

) total. Invariant pooling via sum/mean/max over n elements costs O(nd) time (a single pass) and O(d) space for the accumulator. These complexities make equivariant layers efficient compared to unconstrained alternatives that may scale with

n^{2}

(e.g., dense attention), while embedding strong inductive biases. Practical caveats include memory bandwidth limits when n and d are large, and floating-point discrepancies that can introduce tiny deviations from exact equalities, so comparisons should use tolerances rather than strict equality.

Code Examples

Translation Equivariance of Circular 1D Convolution

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Circular index helper: ensures (i mod n) is in [0, n-1]
5 static inline int mod_idx(int i, int n) {
6     int r = i % n;
7     return r < 0 ? r + n : r;
8 }
9 
10 // Compute circular 1D convolution y = x * k with length(x) = n, length(k) = m
11 // y[t] = sum_j k[j] * x[t - j mod n]
12 vector<double> circular_convolve(const vector<double>& x, const vector<double>& k) {
13     int n = (int)x.size();
14     int m = (int)k.size();
15     vector<double> y(n, 0.0);
16     for (int t = 0; t < n; ++t) {
17         double acc = 0.0;
18         for (int j = 0; j < m; ++j) {
19             int idx = mod_idx(t - j, n);
20             acc += k[j] * x[idx];
21         }
22         y[t] = acc;
23     }
24     return y;
25 }
26 
27 // Circularly shift a signal by shift s: (S_s x)[t] = x[t - s mod n]
28 vector<double> circular_shift(const vector<double>& x, int s) {
29     int n = (int)x.size();
30     vector<double> y(n);
31     for (int t = 0; t < n; ++t) {
32         y[t] = x[mod_idx(t - s, n)];
33     }
34     return y;
35 }
36 
37 int main() {
38     ios::sync_with_stdio(false);
39     cin.tie(nullptr);
40 
41     // Create a random signal and kernel
42     int n = 128;      // signal length
43     int m = 9;        // kernel length (small for locality)
44     int s = 17;       // shift amount
45 
46     std::mt19937 rng(42);
47     std::normal_distribution<double> dist(0.0, 1.0);
48 
49     vector<double> x(n), k(m);
50     for (int i = 0; i < n; ++i) x[i] = dist(rng);
51     for (int j = 0; j < m; ++j) k[j] = dist(rng);
52 
53     // Compute y = x * k
54     vector<double> y = circular_convolve(x, k);
55 
56     // Compute shifted signal and its convolution
57     vector<double> x_shift = circular_shift(x, s);
58     vector<double> y_from_shifted = circular_convolve(x_shift, k);
59 
60     // Shift the original convolution output
61     vector<double> y_shift = circular_shift(y, s);
62 
63     // Compare: translation equivariance means y_from_shifted == y_shift
64     double max_abs_diff = 0.0;
65     for (int i = 0; i < n; ++i) {
66         max_abs_diff = max(max_abs_diff, fabs(y_from_shifted[i] - y_shift[i]));
67     }
68 
69     cout << fixed << setprecision(6);
70     cout << "Max |(S_s x)*k - S_s(x*k)| = " << max_abs_diff << "\n";
71     if (max_abs_diff < 1e-9) {
72         cout << "Equivariance verified (up to numerical precision).\n";
73     } else {
74         cout << "Equivariance broken (likely by implementation or precision).\n";
75     }
76 
77     return 0;
78 }
79

This program implements circular 1D convolution and a circular shift. It demonstrates translation equivariance by showing that convolving a shifted input equals shifting the convolved output. Circular padding ensures exact equivariance on a periodic domain. The maximum absolute difference should be near machine precision.

Time: O(nm) for convolution; O(n) for shifting; total O(nm)Space: O(n) for outputs plus O(n + m) for inputs and kernel

Permutation-Equivariant Linear Layer for Sets

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Multiply a dxd matrix M by a d-vector v
5 vector<double> matvec(const vector<vector<double>>& M, const vector<double>& v) {
6     int d = (int)v.size();
7     vector<double> out(d, 0.0);
8     for (int i = 0; i < d; ++i) {
9         double acc = 0.0;
10         for (int j = 0; j < d; ++j) acc += M[i][j] * v[j];
11         out[i] = acc;
12     }
13     return out;
14 }
15 
16 // Add two d-vectors
17 vector<double> add_vec(const vector<double>& a, const vector<double>& b) {
18     int d = (int)a.size();
19     vector<double> out(d);
20     for (int i = 0; i < d; ++i) out[i] = a[i] + b[i];
21     return out;
22 }
23 
24 // Sum over n elements of d-vectors
25 vector<double> sum_rows(const vector<vector<double>>& X) {
26     int n = (int)X.size();
27     int d = (int)X[0].size();
28     vector<double> s(d, 0.0);
29     for (int i = 0; i < n; ++i)
30         for (int j = 0; j < d; ++j)
31             s[j] += X[i][j];
32     return s;
33 }
34 
35 // Permute rows of X by permutation p: Y[i] = X[p[i]]
36 vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) {
37     int n = (int)X.size();
38     vector<vector<double>> Y(n, vector<double>(X[0].size()));
39     for (int i = 0; i < n; ++i) Y[i] = X[p[i]];
40     return Y;
41 }
42 
43 // Equivariant layer: y_i = A x_i + B * sum_j x_j
44 vector<vector<double>> pe_layer(const vector<vector<double>>& X,
45                                const vector<vector<double>>& A,
46                                const vector<vector<double>>& B) {
47     int n = (int)X.size();
48     int d = (int)X[0].size();
49     vector<double> S = sum_rows(X); // global sum (shared context)
50     vector<vector<double>> Y(n, vector<double>(d, 0.0));
51     vector<double> BS = matvec(B, S);
52     for (int i = 0; i < n; ++i) {
53         vector<double> Ax = matvec(A, X[i]);
54         for (int j = 0; j < d; ++j) Y[i][j] = Ax[j] + BS[j];
55     }
56     return Y;
57 }
58 
59 int main() {
60     ios::sync_with_stdio(false);
61     cin.tie(nullptr);
62 
63     int n = 8;   // number of elements in the set
64     int d = 4;   // feature dimension per element
65 
66     std::mt19937 rng(123);
67     std::normal_distribution<double> dist(0.0, 1.0);
68 
69     // Random set X (n x d)
70     vector<vector<double>> X(n, vector<double>(d));
71     for (int i = 0; i < n; ++i)
72         for (int j = 0; j < d; ++j)
73             X[i][j] = dist(rng);
74 
75     // Random parameters A, B (d x d)
76     vector<vector<double>> A(d, vector<double>(d)), B(d, vector<double>(d));
77     for (int i = 0; i < d; ++i)
78         for (int j = 0; j < d; ++j) {
79             A[i][j] = dist(rng) * 0.5;
80             B[i][j] = dist(rng) * 0.5;
81         }
82 
83     // Random permutation p
84     vector<int> p(n);
85     iota(p.begin(), p.end(), 0);
86     shuffle(p.begin(), p.end(), rng);
87 
88     // Forward on original X
89     auto Y1 = pe_layer(X, A, B);
90     // Permute inputs and forward
91     auto Xp = permute(X, p);
92     auto Y2 = pe_layer(Xp, A, B);
93     // Permute Y1 to compare: P Y1
94     auto PY1 = permute(Y1, p);
95 
96     // Compute max absolute difference between Y2 and P Y1
97     double max_abs_diff = 0.0;
98     for (int i = 0; i < n; ++i)
99         for (int j = 0; j < d; ++j)
100             max_abs_diff = max(max_abs_diff, fabs(Y2[i][j] - PY1[i][j]));
101 
102     cout << fixed << setprecision(6);
103     cout << "Max |f(PX) - P f(X)| = " << max_abs_diff << "\n";
104     if (max_abs_diff < 1e-9) cout << "Permutation equivariance verified.\n";
105     else cout << "Equivariance broken (check implementation).\n";
106 
107     return 0;
108 }
109

This code builds a linear permutation-equivariant layer for sets: y_i = A x_i + B sum_j x_j. It verifies f(PX) = P f(X) for a random permutation P by comparing outputs. The global sum couples elements identically, ensuring commutation with any reordering.

Time: O(nd^2) for the layer; O(nd) for forming the global sum; O(n) to permute indices (O(nd) if copying feature rows)Space: O(nd) to store inputs/outputs plus O(d^2) for parameters

Permutation-Invariant Readout via Sum/Mean/Max Pooling

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Aggregates {
5     vector<double> sum, mean, mx;
6 };
7 
8 Aggregates aggregate(const vector<vector<double>>& X) {
9     int n = (int)X.size();
10     int d = (int)X[0].size();
11     Aggregates aggr;
12     aggr.sum.assign(d, 0.0);
13     aggr.mx.assign(d, -numeric_limits<double>::infinity());
14     
15     for (int i = 0; i < n; ++i) {
16         for (int j = 0; j < d; ++j) {
17             aggr.sum[j] += X[i][j];
18             aggr.mx[j] = max(aggr.mx[j], X[i][j]);
19         }
20     }
21     aggr.mean = aggr.sum;
22     for (int j = 0; j < d; ++j) aggr.mean[j] /= max(1, (int)X.size());
23     return aggr;
24 }
25 
26 vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) {
27     int n = (int)X.size();
28     vector<vector<double>> Y(n, vector<double>(X[0].size()));
29     for (int i = 0; i < n; ++i) Y[i] = X[p[i]];
30     return Y;
31 }
32 
33 int main() {
34     ios::sync_with_stdio(false);
35     cin.tie(nullptr);
36 
37     int n = 10, d = 3;
38     std::mt19937 rng(7);
39     std::uniform_real_distribution<double> dist(-1.0, 1.0);
40 
41     vector<vector<double>> X(n, vector<double>(d));
42     for (int i = 0; i < n; ++i)
43         for (int j = 0; j < d; ++j)
44             X[i][j] = dist(rng);
45 
46     // Random permutation
47     vector<int> p(n); iota(p.begin(), p.end(), 0); shuffle(p.begin(), p.end(), rng);
48 
49     auto A1 = aggregate(X);
50     auto A2 = aggregate(permute(X, p));
51 
52     auto cmp = [](const vector<double>& a, const vector<double>& b){
53         double m = 0.0;
54         for (size_t i = 0; i < a.size(); ++i) m = max(m, fabs(a[i] - b[i]));
55         return m;
56     };
57 
58     cout << fixed << setprecision(6);
59     cout << "Max |sum - sum(PX)| = " << cmp(A1.sum, A2.sum) << "\n";
60     cout << "Max |mean - mean(PX)| = " << cmp(A1.mean, A2.mean) << "\n";
61     cout << "Max |max - max(PX)| = " << cmp(A1.mx, A2.mx) << "\n";
62 
63     return 0;
64 }
65

This example computes sum, mean, and max over element features and shows these aggregations are invariant to permutations. Such invariant readouts are used in DeepSets and GNNs to produce order-independent graph- or set-level summaries.

Time: O(nd) to aggregate; O(n) to permute indices (O(nd) if copying feature rows)Space: O(d) additional space for accumulators

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Circular index helper: ensures (i mod n) is in [0, n-1]
5	static inline int mod_idx(int i, int n) {
6	int r = i % n;
7	return r < 0 ? r + n : r;
8	}
9
10	// Compute circular 1D convolution y = x * k with length(x) = n, length(k) = m
11	// y[t] = sum_j k[j] * x[t - j mod n]
12	vector<double> circular_convolve(const vector<double>& x, const vector<double>& k) {
13	int n = (int)x.size();
14	int m = (int)k.size();
15	vector<double> y(n, 0.0);
16	for (int t = 0; t < n; ++t) {
17	double acc = 0.0;
18	for (int j = 0; j < m; ++j) {
19	int idx = mod_idx(t - j, n);
20	acc += k[j] * x[idx];
21	}
22	y[t] = acc;
23	}
24	return y;
25	}
26
27	// Circularly shift a signal by shift s: (S_s x)[t] = x[t - s mod n]
28	vector<double> circular_shift(const vector<double>& x, int s) {
29	int n = (int)x.size();
30	vector<double> y(n);
31	for (int t = 0; t < n; ++t) {
32	y[t] = x[mod_idx(t - s, n)];
33	}
34	return y;
35	}
36
37	int main() {
38	ios::sync_with_stdio(false);
39	cin.tie(nullptr);
40
41	// Create a random signal and kernel
42	int n = 128; // signal length
43	int m = 9; // kernel length (small for locality)
44	int s = 17; // shift amount
45
46	std::mt19937 rng(42);
47	std::normal_distribution<double> dist(0.0, 1.0);
48
49	vector<double> x(n), k(m);
50	for (int i = 0; i < n; ++i) x[i] = dist(rng);
51	for (int j = 0; j < m; ++j) k[j] = dist(rng);
52
53	// Compute y = x * k
54	vector<double> y = circular_convolve(x, k);
55
56	// Compute shifted signal and its convolution
57	vector<double> x_shift = circular_shift(x, s);
58	vector<double> y_from_shifted = circular_convolve(x_shift, k);
59
60	// Shift the original convolution output
61	vector<double> y_shift = circular_shift(y, s);
62
63	// Compare: translation equivariance means y_from_shifted == y_shift
64	double max_abs_diff = 0.0;
65	for (int i = 0; i < n; ++i) {
66	max_abs_diff = max(max_abs_diff, fabs(y_from_shifted[i] - y_shift[i]));
67	}
68
69	cout << fixed << setprecision(6);
70	cout << "Max \|(S_s x)k - S_s(xk)\| = " << max_abs_diff << "\n";
71	if (max_abs_diff < 1e-9) {
72	cout << "Equivariance verified (up to numerical precision).\n";
73	} else {
74	cout << "Equivariance broken (likely by implementation or precision).\n";
75	}
76
77	return 0;
78	}
79

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Multiply a dxd matrix M by a d-vector v
5	vector<double> matvec(const vector<vector<double>>& M, const vector<double>& v) {
6	int d = (int)v.size();
7	vector<double> out(d, 0.0);
8	for (int i = 0; i < d; ++i) {
9	double acc = 0.0;
10	for (int j = 0; j < d; ++j) acc += M[i][j] * v[j];
11	out[i] = acc;
12	}
13	return out;
14	}
15
16	// Add two d-vectors
17	vector<double> add_vec(const vector<double>& a, const vector<double>& b) {
18	int d = (int)a.size();
19	vector<double> out(d);
20	for (int i = 0; i < d; ++i) out[i] = a[i] + b[i];
21	return out;
22	}
23
24	// Sum over n elements of d-vectors
25	vector<double> sum_rows(const vector<vector<double>>& X) {
26	int n = (int)X.size();
27	int d = (int)X[0].size();
28	vector<double> s(d, 0.0);
29	for (int i = 0; i < n; ++i)
30	for (int j = 0; j < d; ++j)
31	s[j] += X[i][j];
32	return s;
33	}
34
35	// Permute rows of X by permutation p: Y[i] = X[p[i]]
36	vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) {
37	int n = (int)X.size();
38	vector<vector<double>> Y(n, vector<double>(X[0].size()));
39	for (int i = 0; i < n; ++i) Y[i] = X[p[i]];
40	return Y;
41	}
42
43	// Equivariant layer: y_i = A x_i + B * sum_j x_j
44	vector<vector<double>> pe_layer(const vector<vector<double>>& X,
45	const vector<vector<double>>& A,
46	const vector<vector<double>>& B) {
47	int n = (int)X.size();
48	int d = (int)X[0].size();
49	vector<double> S = sum_rows(X); // global sum (shared context)
50	vector<vector<double>> Y(n, vector<double>(d, 0.0));
51	vector<double> BS = matvec(B, S);
52	for (int i = 0; i < n; ++i) {
53	vector<double> Ax = matvec(A, X[i]);
54	for (int j = 0; j < d; ++j) Y[i][j] = Ax[j] + BS[j];
55	}
56	return Y;
57	}
58
59	int main() {
60	ios::sync_with_stdio(false);
61	cin.tie(nullptr);
62
63	int n = 8; // number of elements in the set
64	int d = 4; // feature dimension per element
65
66	std::mt19937 rng(123);
67	std::normal_distribution<double> dist(0.0, 1.0);
68
69	// Random set X (n x d)
70	vector<vector<double>> X(n, vector<double>(d));
71	for (int i = 0; i < n; ++i)
72	for (int j = 0; j < d; ++j)
73	X[i][j] = dist(rng);
74
75	// Random parameters A, B (d x d)
76	vector<vector<double>> A(d, vector<double>(d)), B(d, vector<double>(d));
77	for (int i = 0; i < d; ++i)
78	for (int j = 0; j < d; ++j) {
79	A[i][j] = dist(rng) * 0.5;
80	B[i][j] = dist(rng) * 0.5;
81	}
82
83	// Random permutation p
84	vector<int> p(n);
85	iota(p.begin(), p.end(), 0);
86	shuffle(p.begin(), p.end(), rng);
87
88	// Forward on original X
89	auto Y1 = pe_layer(X, A, B);
90	// Permute inputs and forward
91	auto Xp = permute(X, p);
92	auto Y2 = pe_layer(Xp, A, B);
93	// Permute Y1 to compare: P Y1
94	auto PY1 = permute(Y1, p);
95
96	// Compute max absolute difference between Y2 and P Y1
97	double max_abs_diff = 0.0;
98	for (int i = 0; i < n; ++i)
99	for (int j = 0; j < d; ++j)
100	max_abs_diff = max(max_abs_diff, fabs(Y2[i][j] - PY1[i][j]));
101
102	cout << fixed << setprecision(6);
103	cout << "Max \|f(PX) - P f(X)\| = " << max_abs_diff << "\n";
104	if (max_abs_diff < 1e-9) cout << "Permutation equivariance verified.\n";
105	else cout << "Equivariance broken (check implementation).\n";
106
107	return 0;
108	}
109

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct Aggregates {
5	vector<double> sum, mean, mx;
6	};
7
8	Aggregates aggregate(const vector<vector<double>>& X) {
9	int n = (int)X.size();
10	int d = (int)X[0].size();
11	Aggregates aggr;
12	aggr.sum.assign(d, 0.0);
13	aggr.mx.assign(d, -numeric_limits<double>::infinity());
14
15	for (int i = 0; i < n; ++i) {
16	for (int j = 0; j < d; ++j) {
17	aggr.sum[j] += X[i][j];
18	aggr.mx[j] = max(aggr.mx[j], X[i][j]);
19	}
20	}
21	aggr.mean = aggr.sum;
22	for (int j = 0; j < d; ++j) aggr.mean[j] /= max(1, (int)X.size());
23	return aggr;
24	}
25
26	vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) {
27	int n = (int)X.size();
28	vector<vector<double>> Y(n, vector<double>(X[0].size()));
29	for (int i = 0; i < n; ++i) Y[i] = X[p[i]];
30	return Y;
31	}
32
33	int main() {
34	ios::sync_with_stdio(false);
35	cin.tie(nullptr);
36
37	int n = 10, d = 3;
38	std::mt19937 rng(7);
39	std::uniform_real_distribution<double> dist(-1.0, 1.0);
40
41	vector<vector<double>> X(n, vector<double>(d));
42	for (int i = 0; i < n; ++i)
43	for (int j = 0; j < d; ++j)
44	X[i][j] = dist(rng);
45
46	// Random permutation
47	vector<int> p(n); iota(p.begin(), p.end(), 0); shuffle(p.begin(), p.end(), rng);
48
49	auto A1 = aggregate(X);
50	auto A2 = aggregate(permute(X, p));
51
52	auto cmp = [](const vector<double>& a, const vector<double>& b){
53	double m = 0.0;
54	for (size_t i = 0; i < a.size(); ++i) m = max(m, fabs(a[i] - b[i]));
55	return m;
56	};
57
58	cout << fixed << setprecision(6);
59	cout << "Max \|sum - sum(PX)\| = " << cmp(A1.sum, A2.sum) << "\n";
60	cout << "Max \|mean - mean(PX)\| = " << cmp(A1.mean, A2.mean) << "\n";
61	cout << "Max \|max - max(PX)\| = " << cmp(A1.mx, A2.mx) << "\n";
62
63	return 0;
64	}
65