📚TheoryAdvanced

Gauge Equivariant Networks

Key Points

•
Gauge equivariant networks are neural networks that respect local symmetries (gauges) on manifolds, such as how vectors rotate when you change the local reference frame on a surface.
•
They work by representing features in fibers (like tangent spaces) and ensuring every operation (transport, convolution, nonlinearity) transforms correctly under the structure group (e.g., SO(2) on surfaces).
•
Discrete implementations on meshes use parallel transport angles on edges to align neighbor features before aggregation and restrict weights to be intertwiners that do not break equivariance.
•
A practical recipe is to expand features into circular-harmonic types (orders m=0,1,2,...) where rotations act by simple 2×2 blocks and to share weights within each order.
•
Nonlinearities must be chosen carefully (e.g., norm-based or gated) so that higher-order vector features remain equivariant.
•
You can numerically verify equivariance by randomly changing gauges per vertex, updating edge transport angles accordingly, and checking outputs rotate exactly as predicted.
•
The computational cost is similar to message-passing GNNs, typically O(E · F) per layer, with small extra trigonometric costs for rotations.
•
Gauge equivariance is essential for learning on surfaces, vector fields, and physics problems where predictions should not depend on arbitrary local frame choices.

Prerequisites

→Basic group theory — Understand group actions, representations, and the difference between invariance and equivariance.
→Linear algebra — Features live in vector spaces and are transformed by matrices; weights are intertwiners between representations.
→Multivariable calculus and manifolds (intro) — Frames, tangent spaces, and the exponential map underpin convolution on manifolds.
→Graph neural networks (GNNs) — Gauge-equivariant layers on meshes are implemented as message passing with special alignment and weights.
→Fourier analysis on the circle — Circular harmonics (cos/sin pairs) are the irreducible representations of SO(2) used to rotate features.
→Differential geometry (connections, parallel transport) — Edge transport angles discretize connections; equivariance relies on correct parallel transport.
→Representation theory (irreps, intertwiners) — Weight sharing per order and nonlinearity design come from representation-theoretic constraints.
→Numerical methods on meshes/graphs — Discretization choices (angles, neighbors) affect stability and correctness.
→C++ programming (vectors, performance) — Implementing efficient rotations, aggregations, and tests requires solid C++.
→Trigonometry — Rotations use sin/cos and angle normalization; numerical care is needed.

Detailed Explanation

Tap terms for definitions

01Overview

Gauge equivariant networks are neural architectures designed to process data that live on manifolds with local symmetries (gauges). Think of a 2D surface like a curved sheet: at every point you can draw a little x–y axis (a tangent frame). There is no globally consistent way to pick all these axes on a curved surface, and you are free to rotate each local frame without changing the underlying geometry. A network is gauge equivariant if its outputs transform predictably when you rotate these local frames. This is crucial when inputs and outputs are not just scalars but geometric objects like tangent vectors or higher-order features that live in fibers of a bundle over the manifold. In practice, we discretize the manifold as a mesh or graph, store a frame (gauge) per node, and store how to align frames along edges (a discrete connection, or parallel transport angle). Features are decomposed into irreducible representations of the gauge group (e.g., circular harmonics for SO(2)), which makes rotations act by tiny block matrices. Layers align neighbor features via parallel transport, apply order-wise shared weights (intertwiners), aggregate, and optionally apply equivariant nonlinearities. With these ingredients, the network’s predictions depend only on the underlying geometry and data—not on the arbitrary choice of frames.

02Intuition & Analogies

Imagine you are hiking across a hilly terrain (a surface). At every point you carry a tiny compass (your local frame) showing “east” and “north.” When you walk to a neighbor point, the notion of east/north twists a little, depending on the terrain’s curvature—this twist is parallel transport. Now suppose you want a rule to combine wind measurements (vectors) from your neighbors to predict the wind at your location. If your friend uses a different brand of compass (rotated by, say, 30° everywhere), both of you should still get predictions that agree up to that same 30° rotation. A network with this property is gauge equivariant. A second analogy: playlists and headphones. You can wear your headphones at any rotation; the song you hear “rotates” with the headphones, but the underlying melody is unchanged. If two people wear their headphones differently, a device that analyzes the song should produce the same conclusions regardless of headphone angle. In geometry, the headphone angle is the gauge (local frame choice), and the song is the true geometric signal. Operationally, we break features into types that we know how to rotate easily—like splitting a signal on a circle into sines and cosines. When we rotate the frame by an angle θ, an m-th order sine/cosine pair just rotates by m·θ via a $2×2$ matrix. That makes it cheap to “line up” neighbor features by undoing their local rotation before summing. We then only allow weights that do not mix different types (orders), so the whole computation commutes with every local rotation. The result is a model that learns geometry-aware patterns without being confused by arbitrary frame choices.

03Formal Definition

Let M be a smooth manifold with a principal G-bundle of frames (structure group G, e.g.,

G = SO (2

) for 2D surface tangents). A feature field is a section s of an associated vector bundle

E = F \times_{ρ_{in}} V_{in}

, where F is the frame bundle and

ρ_{in}

\to

GL(

V_{in}

) is a representation. A gauge is a choice of local trivialization (frame) g:M

\to

G. Under a gauge change g, the section transforms pointwise as

s^{g}

(x)=

ρ_{in}

(g(x))\,s(x). A network layer F mapping sections

E_{in} \to

E_{o u t}

is gauge equivariant if for every gauge field g, F obeys F[

s^{g}

](x)=

ρ_{o u t}

(g(x))\,F[s](x), where

ρ_{o u t}

is the output representation. In continuous gauge-equivariant convolution on M, one constructs at each x a kernel

K_{x}

T_{x}

\to

Hom

(

V_{in}

V_{o u t}

), defined in a local frame. The kernel must satisfy the steerability constraint

K_{x}

\cdot

p)=

ρ_{o u t}

(R)\,

K_{x}

(p)\,

ρ_{in}

(R)^{-1} for every R

\in

G acting on tangent vectors p. With this, the operator (F s)(x)=

\int_{T_{x} M} K_{x}

(p)\,

\big

(s(

exp_{x}

\big

)\,dp is well defined and independent of the chosen frame at x. On a discrete mesh/graph, we store a frame at each vertex u and a parallel transport (connection) along edges as a group element

g_{u \leftarrow v} \in

G mapping v’s frame to u’s frame. A message-passing layer is equivariant if it aggregates as

h^{(l + 1)}

(u)=

\sum_{v \in N (u)} ρ_{o u t}

(

g_{u \leftarrow v}

)\,

Φ

(

d_{uv}

)\,

ρ_{in}

(

g_{u \leftarrow v}

)^{-1}\,

h^{(l)}

(v), with weight sharing restricted so that

Φ

interwines the two representations per order. Nonlinearities are required to be equivariant too (e.g., act on norms or via gates from invariant channels).

04When to Use

Surface or manifold data: Learning from signals on curved surfaces (e.g., textures on 3D meshes, climate fields on the sphere) where directions matter and rotate from point to point.
Vector/tensor fields: Tasks with inputs/outputs that are not scalars—wind velocity on Earth, fiber orientations in diffusion MRI, principal curvature directions, or electromagnetic fields—where predictions must rotate with local frames.
Physics and PDEs on meshes: When physical laws are covariant and you want models that respect local symmetry, avoiding spurious frame-dependent artifacts.
Low-data or generalization-critical regimes: Equivariance bakes in strong inductive bias, improving data efficiency and robustness when orientations vary or gauges are arbitrary.
Non-Euclidean sensor layouts: Graphs from geodesic neighborhoods, point clouds on surfaces, or non-orientable domains where a global orientation is impossible; local gauge handling keeps the model consistent. Avoid gauge-equivariant machinery if: signals are purely scalar and orientation-agnostic (standard GNN/CNN suffices), the domain is flat with a fixed global frame (ordinary SE(2)/SO(3) equivariance may be simpler), or if obtaining a reliable discrete connection (parallel transport) is infeasible/noisy enough to dominate errors.

⚠️Common Mistakes

Confusing invariance with equivariance: Invariance forces outputs not to change; equivariance requires predictable change (e.g., vectors rotate). Using invariant pooling when you need vector outputs discards information.
Forgetting frame alignment: Aggregating neighbors without parallel transport (or with the wrong sign) breaks equivariance. Always rotate neighbor features into the receiver’s frame before combining.
Mixing representation orders: Weight matrices must not mix different harmonic orders (m); only mix channels within the same order. Cross-order mixing violates the intertwiner constraint and breaks equivariance.
Using standard nonlinearities on vector/tensor channels: ReLU on each component is not equivariant. Use norm-based, gated, or tensor product nonlinearities designed for the representation.
Ignoring reflections when the domain/group is O(2) rather than SO(2): If local frames can flip, include reflection parity and use the correct representation; otherwise, predictions will be inconsistent on non-orientable meshes.
Inconsistent angle conventions: Swapping source/target, using degrees vs radians, or adding instead of subtracting angles in updates like \theta'_{u\leftarrow v}=\thet $a_{u\leftarrow v}$ +\gamma(u)-\gamma(v) yields subtle bugs.
Numerical drift: Accumulating many transports around loops (holonomy) can cause angle wrap issues; always normalize angles to (−\pi,\pi] and test loop consistency.
Overcomplicating kernels: Start with simple radial weights shared per order; only add complex steerable bases once the pipeline is correct and numerically verified.

Key Formulas

Gauge Equivariance

F [s^{g}] (x) = ρ_{o u t} (g (x)) F [s] (x)

Explanation: This states that if inputs are transformed by a gauge field g via the input representation, the layer’s outputs transform by the output representation at each point. The computation commutes with any choice of local frames.

Input Gauge Action

s^{g} (x) = ρ_{in} (g (x)) s (x)

Explanation: A feature section s transforms pointwise under a gauge change g by the input representation. This encodes how components rotate with the frame.

Steerability Constraint

K (R \cdot p) = ρ_{o u t} (R) K (p) ρ_{in} (R)^{- 1}, R \in G

Explanation: The kernel defined in a local frame must transform consistently when we rotate the local coordinates. This ensures the resulting convolution does not depend on the arbitrary frame.

Manifold Convolution (Local Chart)

(F s) (x) = \int_{T_{x} M} K_{x} (p) s (exp_{x} p) d p

Explanation: At each point x, integrate a kernel over the tangent space using the exponential map to sample neighbors. With a steerable kernel, this is gauge independent.

SO(2) Irrep (Order m>0)

R_{m} (θ) = [cos (m θ) sin (m θ) - sin (m θ) cos (m θ)]

Explanation: Rotating the frame by θ rotates a sine/cosine pair of order m by mθ with a 2×2 rotation block. This is the basic building block for feature rotation.

Discrete Gauge-Equivariant Message Passing

h^{(l + 1)} (u) = v \in N (u) \sum ρ_{o u t} (g_{u \leftarrow v}) Φ (d_{uv}) ρ_{in} (g_{u \leftarrow v})^{- 1} h^{(l)} (v)

Explanation: Neighbor features are first aligned to u’s frame via the connection $g_{u \leftarrow v}$ , processed by order-wise weights Φ (intertwiners), and summed. This guarantees equivariance if Φ respects representation blocks.

Gauge Update of Edge Transport (SO(2))

θ_{u \leftarrow v}^{'} = θ_{u \leftarrow v} + γ (u) - γ (v)

Explanation: When you rotate the frame at each vertex by a gauge field $γ,$ the edge transport angles must be updated accordingly. This preserves the meaning of parallel transport between frames.

Feature Dimension by Orders

F = m = 0 \sum L d_{m} C_{m}, d_{0} = 1, d_{m} = 2 (m > 0)

Explanation: The total number of scalar components per vertex equals the sum over orders: 1 per scalar channel (m=0) and 2 per vector-like channel (m>0). This is useful for complexity estimates.

Norm-based Equivariant Nonlinearity

ϕ_{m} (v) = σ (∥ v ∥) \frac{v}{∥ v ∥ + ε}

Explanation: A function of the norm (invariant under rotations) scales the vector without changing its direction relative to the frame. This preserves equivariance for m>0 channels.

Complexity Analysis

Let V be the number of vertices, E the number of (directed) edges, and let feature types be indexed by orders

m = 0

..L with

C_{m}

channels per order. Each order contributes

d_{m}

components per channel, where

d_{0} = 1

and

d_{m} = 2

for

m > 0

. The total feature dimension per vertex is F=

\sum_{m}

d_{m}

C_{m}

. A gauge-equivariant message passing layer performs, for each edge, (1) a rotation of neighbor features by the edge’s transport angle and (2) an order-wise linear mixing by weights that do not couple different m. The rotation cost per edge is O(F) with very small constants (just computing cos(m

θ),

sin(m

θ)

and applying 2×2 blocks for

m > 0

). If mixing is done per edge, the linear map per order costs O(

C_{m}^{in}

C_{m}^{o u t}

d_{m}

), giving a total per edge cost O(

\sum_{m}

d_{m}

C_{m}^{in}

C_{m}^{o u t}

). Summed over edges, the layer time complexity is O(E·(F +

\sum_{m}

d_{m}

C_{m}^{in}

C_{m}^{o u t}

)). When

C_{m}^{in} = C_{m}^{o u t} = C

across orders and L is small, this is effectively O(E·(F +

C^{2}

L)). If mixing is deferred until after summation (align first, sum, then apply the linear map once per node), the cost becomes O(E·F + V·

\sum_{m}

d_{m}

C_{m}^{in}

C_{m}^{o u t}

), which is preferable when degrees are large. Space complexity is O(V·F + E) for storing features and edge transports, plus O(

\sum_{m}

C_{m}^{in}

C_{m}^{o u t}

) for the weights (independent of V and E). Caching sin/cos(m

θ)

per edge and order costs O(E·L) extra space but reduces recomputation. Overall, the asymptotics match standard GNNs with a small overhead for frame alignment, while delivering strong inductive bias from equivariance.

Code Examples

Utility: SO(2) circular-harmonic feature rotation (orders m=0..L)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Rotate a feature vector organized by SO(2) orders m=0..L.
5 // For each order m:
6 //  - m==0: C_m scalar channels (unchanged by rotation)
7 //  - m>0 : C_m pairs (cos,sin) per channel rotated by angle m*theta
8 // Layout per m is contiguous: [m=0 scalars][m=1 (c0,c1,c0',c1'), ...][m=2 ...]
9 
10 struct HarmonicLayout {
11     vector<int> C;     // C[m] = number of channels at order m
12     vector<int> off;   // starting offset for order m
13     vector<int> size;  // total scalar components for order m
14     int total = 0;     // total feature dimension
15 
16     explicit HarmonicLayout(const vector<int>& C_) : C(C_) {
17         int L = (int)C.size() - 1;
18         off.resize(L+1); size.resize(L+1);
19         int cur = 0;
20         for (int m = 0; m <= L; ++m) {
21             int dm = (m==0 ? 1 : 2);
22             off[m] = cur;
23             size[m] = dm * C[m];
24             cur += size[m];
25         }
26         total = cur;
27     }
28 };
29 
30 vector<double> rotate_so2(const vector<double>& feat, double theta, const HarmonicLayout& H) {
31     vector<double> out(H.total);
32     for (int m = 0; m < (int)H.C.size(); ++m) {
33         int C_m = H.C[m];
34         int dm = (m==0 ? 1 : 2);
35         int base = H.off[m];
36         if (m == 0) {
37             for (int c = 0; c < C_m; ++c) out[base + c] = feat[base + c];
38         } else {
39             double ang = m * theta;
40             double cs = cos(ang), sn = sin(ang);
41             for (int c = 0; c < C_m; ++c) {
42                 double a = feat[base + 2*c + 0]; // cos component
43                 double b = feat[base + 2*c + 1]; // sin component
44                 // [a'; b'] = R_m(theta) [a; b]
45                 out[base + 2*c + 0] =  cs * a - sn * b;
46                 out[base + 2*c + 1] =  sn * a + cs * b;
47             }
48         }
49     }
50     return out;
51 }
52 
53 int main() {
54     // Example: orders m=0..2, with C0=1 scalar, C1=2 vector channels, C2=1 second-order channel
55     HarmonicLayout H({1, 2, 1});
56     cout << "Total feature dim = " << H.total << "\n";
57 
58     // Build a feature: [m=0: s0] [m=1: (a1,b1),(a2,b2)] [m=2: (c1,d1)]
59     vector<double> f(H.total);
60     // m=0
61     f[H.off[0] + 0] = 3.0;
62     // m=1 (two channels)
63     f[H.off[1] + 0] = 1.0; // a1
64     f[H.off[1] + 1] = 0.5; // b1
65     f[H.off[1] + 2] = -0.3; // a2
66     f[H.off[1] + 3] = 2.0;  // b2
67     // m=2 (one channel)
68     f[H.off[2] + 0] = 0.7; // c1
69     f[H.off[2] + 1] = -1.2; // d1
70 
71     double theta = 0.4; // radians
72     auto fr = rotate_so2(f, theta, H);
73 
74     // Rotate back by -theta; should recover original
75     auto f_rr = rotate_so2(fr, -theta, H);
76 
77     cout.setf(std::ios::fixed); cout<<setprecision(6);
78     cout << "Original vs. Rotated-back (should match)\n";
79     double max_err = 0.0;
80     for (int i = 0; i < H.total; ++i) {
81         double err = fabs(f[i] - f_rr[i]);
82         max_err = max(max_err, err);
83         cout << i << ": " << f[i] << " vs " << f_rr[i] << "\n";
84     }
85     cout << "Max error: " << max_err << "\n";
86     return 0;
87 }
88

This program defines a layout for SO(2) harmonic orders and rotates features accordingly. Scalar channels (m=0) stay fixed, while each (cos,sin) pair for m>0 is rotated by m·θ using a 2×2 block. The test rotates a feature by θ and back by −θ to verify correctness numerically.

Time: O(F) per rotation, where F = \sum_{m} d_{m} C_{m}.Space: O(F) auxiliary space for the output vector.

Gauge-equivariant message passing layer on a mesh (SO(2), order-wise intertwiners)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct HarmonicLayout {
5     vector<int> C; vector<int> off; vector<int> size; int total=0;
6     explicit HarmonicLayout(const vector<int>& C_) : C(C_) {
7         int L = (int)C.size() - 1; off.resize(L+1); size.resize(L+1);
8         int cur=0; for(int m=0;m<=L;++m){int dm=(m==0?1:2); off[m]=cur; size[m]=dm*C[m]; cur+=size[m];} total=cur;
9     }
10 };
11 
12 vector<double> rotate_so2(const vector<double>& feat, double theta, const HarmonicLayout& H){
13     vector<double> out(H.total);
14     for(int m=0;m<(int)H.C.size();++m){int Cm=H.C[m];int base=H.off[m]; if(m==0){
15         for(int c=0;c<Cm;++c) out[base+c]=feat[base+c];
16     }else{double ang=m*theta; double cs=cos(ang), sn=sin(ang);
17         for(int c=0;c<Cm;++c){double a=feat[base+2*c+0]; double b=feat[base+2*c+1];
18             out[base+2*c+0]=cs*a - sn*b; out[base+2*c+1]=sn*a + cs*b;}
19     }} return out;
20 }
21 
22 // Directed edge with transport angle from v -> u (align v's frame to u's frame)
23 struct Edge { int v; double theta; double w; };
24 
25 // Gauge-equivariant layer: per-order weight matrices (do not mix orders)
26 struct GaugeLayer {
27     HarmonicLayout H_in, H_out;
28     vector<vector<double>> W0;                 // m=0: [C0_out x C0_in]
29     vector<vector<vector<double>>> Wm;         // m>0: per m, [C_out x C_in]
30 
31     GaugeLayer(const HarmonicLayout& Hin, const HarmonicLayout& Hout)
32         : H_in(Hin), H_out(Hout) {
33         int L = (int)H_in.C.size() - 1; // assume same L in/out
34         // Initialize weights with small random values
35         std::mt19937 rng(42); std::normal_distribution<double> N(0.0, 0.1);
36         W0.assign(H_out.C[0], vector<double>(H_in.C[0]));
37         for (int i=0;i<H_out.C[0];++i) for (int j=0;j<H_in.C[0];++j) W0[i][j]=N(rng);
38         Wm.resize(L+1);
39         for (int m=1;m<=L;++m){
40             Wm[m].assign(H_out.C[m], vector<double>(H_in.C[m]));
41             for (int i=0;i<H_out.C[m];++i) for (int j=0;j<H_in.C[m];++j) Wm[m][i][j]=N(rng);
42         }
43     }
44 
45     // Forward on a graph: adj[u] lists incoming neighbors v with transport theta_{u<-v}
46     vector<vector<double>> forward(const vector<vector<Edge>>& adj, const vector<vector<double>>& X){
47         int V = (int)adj.size();
48         vector<vector<double>> Y(V, vector<double>(H_out.total, 0.0));
49         for (int u=0; u<V; ++u){
50             // Accumulators per order (in u's frame)
51             // Start with zero
52             vector<double> acc0(H_out.C[0], 0.0);
53             vector<vector<double>> accm(H_out.C.size()); // m>0: (cos,sin) pairs per out channel
54             for (int m=1; m<(int)H_out.C.size(); ++m) accm[m] = vector<double>(2*H_out.C[m], 0.0);
55 
56             for (const Edge& e: adj[u]){
57                 int v = e.v; double theta = e.theta; double w = e.w;
58                 // Align neighbor feature to u's frame
59                 vector<double> x_align = rotate_so2(X[v], theta, H_in);
60                 // m=0: scalar mixing
61                 for (int i=0;i<H_out.C[0];++i){
62                     double sum=0.0;
63                     for (int j=0;j<H_in.C[0];++j){
64                         sum += W0[i][j] * x_align[H_in.off[0] + j];
65                     }
66                     acc0[i] += w * sum;
67                 }
68                 // m>0: mix channels within the same m; apply weights identically to cos/sin components
69                 for (int m=1; m<(int)H_in.C.size(); ++m){
70                     int Cin = H_in.C[m], Cout = H_out.C[m];
71                     int inb = H_in.off[m]; int outb = H_out.off[m];
72                     for (int i=0;i<Cout;++i){
73                         double sum_c=0.0, sum_s=0.0;
74                         for (int j=0;j<Cin;++j){
75                             double w_ij = Wm[m][i][j];
76                             double xc = x_align[inb + 2*j + 0];
77                             double xs = x_align[inb + 2*j + 1];
78                             sum_c += w_ij * xc;
79                             sum_s += w_ij * xs;
80                         }
81                         accm[m][2*i+0] += w * sum_c;
82                         accm[m][2*i+1] += w * sum_s;
83                     }
84                 }
85             }
86             // Write accumulators into output feature vector Y[u]
87             for (int i=0;i<H_out.C[0];++i) Y[u][H_out.off[0]+i] = acc0[i];
88             for (int m=1; m<(int)H_out.C.size(); ++m){
89                 for (int i=0;i<H_out.C[m]; ++i){
90                     Y[u][H_out.off[m] + 2*i + 0] = accm[m][2*i+0];
91                     Y[u][H_out.off[m] + 2*i + 1] = accm[m][2*i+1];
92                 }
93             }
94         }
95         return Y;
96     }
97 };
98 
99 int main(){
100     // Simple graph: 4 vertices in a chain 0-1-2-3
101     int V = 4; vector<vector<Edge>> adj(V);
102     auto add_dir = [&](int u,int v,double theta,double w){ adj[u].push_back({v,theta,w}); };
103 
104     // Example SO(2) orders: m=0:1 channel, m=1:1 channel (2 comps) => total dim = 3
105     HarmonicLayout Hin({1,1}); HarmonicLayout Hout({1,1});
106 
107     // Edge transports (toy angles), make directed both ways with opposite sign
108     add_dir(0,1, +0.2, 1.0); add_dir(1,0, -0.2, 1.0);
109     add_dir(1,2, -0.1, 1.0); add_dir(2,1, +0.1, 1.0);
110     add_dir(2,3, +0.3, 1.0); add_dir(3,2, -0.3, 1.0);
111 
112     // Random input features per vertex
113     std::mt19937 rng(0); std::uniform_real_distribution<double> U(-1.0,1.0);
114     vector<vector<double>> X(V, vector<double>(Hin.total));
115     for(int u=0;u<V;++u){
116         for(int i=0;i<Hin.total;++i) X[u][i]=U(rng);
117     }
118 
119     GaugeLayer layer(Hin, Hout);
120     auto Y = layer.forward(adj, X);
121 
122     cout.setf(std::ios::fixed); cout<<setprecision(4);
123     for(int u=0;u<V;++u){
124         cout << "Y["<<u<<"] = ";
125         for(double z : Y[u]) cout << z << ' ';
126         cout << '\n';
127     }
128     return 0;
129 }
130

This program implements a single gauge-equivariant message passing layer on a small directed graph. Each edge stores a parallel transport angle that aligns neighbor features to the receiver’s frame. Features are organized by harmonic order; weights mix channels only within the same order (intertwiners), preserving equivariance. The output at each node is the sum of aligned, linearly mixed neighbor features.

Time: O(E · (F + \sum_{m} d_{m} C_{m}^{in} C_{m}^{out})), typically dominated by O(E · F).Space: O(V · F + E) for data and transports, plus O(\sum_{m} C_{m}^{in} C_{m}^{out}) for weights.

Numerical verification of gauge equivariance under random per-vertex gauges

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct HarmonicLayout { vector<int> C, off, size; int total=0; HarmonicLayout(const vector<int>& C_):C(C_){int L=C.size()-1;off.resize(L+1);size.resize(L+1);int cur=0;for(int m=0;m<=L;++m){int dm=(m==0?1:2);off[m]=cur;size[m]=dm*C[m];cur+=size[m];}total=cur;} };
5 
6 vector<double> rotate_so2(const vector<double>& x, double theta, const HarmonicLayout& H){ vector<double> y(H.total); for(int m=0;m<(int)H.C.size();++m){int Cm=H.C[m], base=H.off[m]; if(m==0){for(int c=0;c<Cm;++c) y[base+c]=x[base+c];} else {double a=m*theta, cs=cos(a), sn=sin(a); for(int c=0;c<Cm;++c){double xc=x[base+2*c+0], xs=x[base+2*c+1]; y[base+2*c+0]=cs*xc - sn*xs; y[base+2*c+1]=sn*xc + cs*xs;}} } return y; }
7 
8 struct Edge { int v; double theta; double w; };
9 
10 struct GaugeLayer {
11     HarmonicLayout H; vector<double> W0; vector<double> W1; // simple: same in/out channels and only m=0,1
12     GaugeLayer(const HarmonicLayout& H_) : H(H_) {
13         std::mt19937 rng(123); std::normal_distribution<double> N(0.0, 0.1);
14         W0.resize(H.C[0]*H.C[0]); for(double &z:W0) z=N(rng);
15         W1.resize(H.C[1]*H.C[1]); for(double &z:W1) z=N(rng);
16     }
17     vector<vector<double>> forward(const vector<vector<Edge>>& adj, const vector<vector<double>>& X){
18         int V=adj.size(); vector<vector<double>> Y(V, vector<double>(H.total,0.0));
19         for(int u=0;u<V;++u){
20             // accumulators
21             vector<double> acc0(H.C[0], 0.0); vector<double> acc1(2*H.C[1], 0.0);
22             for(const Edge& e: adj[u]){
23                 auto xa = rotate_so2(X[e.v], e.theta, H); // align into u's frame
24                 // m=0 mixing: y0_i += w * sum_j W0[i,j] * x0_j
25                 for(int i=0;i<H.C[0];++i){ double s=0.0; for(int j=0;j<H.C[0];++j) s += W0[i*H.C[0]+j] * xa[H.off[0]+j]; acc0[i]+=e.w*s; }
26                 // m=1 mixing: apply same weights to cos/sin components
27                 for(int i=0;i<H.C[1];++i){ double sc=0.0, ss=0.0; for(int j=0;j<H.C[1];++j){ double w=W1[i*H.C[1]+j]; sc += w * xa[H.off[1]+2*j+0]; ss += w * xa[H.off[1]+2*j+1]; } acc1[2*i+0]+=e.w*sc; acc1[2*i+1]+=e.w*ss; }
28             }
29             // write out
30             for(int i=0;i<H.C[0];++i) Y[u][H.off[0]+i]=acc0[i];
31             for(int i=0;i<H.C[1];++i){ Y[u][H.off[1]+2*i+0]=acc1[2*i+0]; Y[u][H.off[1]+2*i+1]=acc1[2*i+1]; }
32         }
33         return Y;
34     }
35 };
36 
37 // Apply a random gauge gamma[u] to features and update edge transports: theta' = theta + gamma(u) - gamma(v)
38 void apply_gauge(const vector<double>& gamma, const HarmonicLayout& H,
39                  vector<vector<double>>& X, vector<vector<Edge>>& adj){
40     int V = X.size();
41     for(int u=0; u<V; ++u){ X[u] = rotate_so2(X[u], gamma[u], H); }
42     for(int u=0; u<V; ++u){ for(Edge& e: adj[u]){ e.theta = e.theta + gamma[u] - gamma[e.v]; } }
43     // normalize angles to (-pi, pi]
44     auto wrap = [](double a){ while(a<=-M_PI) a+=2*M_PI; while(a>M_PI) a-=2*M_PI; return a; };
45     for(int u=0; u<V; ++u){ for(Edge& e: adj[u]) e.theta = wrap(e.theta); }
46 }
47 
48 int main(){
49     // Graph: 5 nodes in a ring to include nontrivial holonomy
50     int V=5; vector<vector<Edge>> adj(V);
51     auto add_bidir = [&](int u,int v,double th){ adj[u].push_back({v, th, 1.0}); adj[v].push_back({u, -th, 1.0}); };
52     add_bidir(0,1, 0.20); add_bidir(1,2, -0.10); add_bidir(2,3, 0.25); add_bidir(3,4, -0.15); add_bidir(4,0, 0.05);
53 
54     // Orders m=0:1 scalar, m=1:2 vector channels (dim=1+2*2=5)
55     HarmonicLayout H({1,2});
56 
57     // Random features
58     std::mt19937 rng(7); std::uniform_real_distribution<double> U(-1.0,1.0);
59     vector<vector<double>> X(V, vector<double>(H.total));
60     for(int u=0; u<V; ++u) for(int i=0;i<H.total;++i) X[u][i]=U(rng);
61 
62     GaugeLayer layer(H);
63     auto Y1 = layer.forward(adj, X); // baseline output
64 
65     // Draw random per-vertex gauge angles
66     vector<double> gamma(V); for(double &g:gamma) g = U(rng);
67 
68     // Copy X and adj, apply gauge
69     auto Xg = X; auto adjg = adj; apply_gauge(gamma, H, Xg, adjg);
70 
71     // Forward under new gauge
72     auto Y2 = layer.forward(adjg, Xg);
73 
74     // Rotate Y2 back by -gamma[u] for comparison with Y1
75     double max_err = 0.0; double sum_err = 0.0; int cnt=0;
76     for(int u=0; u<V; ++u){ auto Y2b = rotate_so2(Y2[u], -gamma[u], H);
77         for(int i=0;i<H.total;++i){ double err=fabs(Y1[u][i]-Y2b[i]); max_err=max(max_err,err); sum_err+=err; ++cnt; }
78     }
79     cout.setf(std::ios::fixed); cout<<setprecision(6);
80     cout << "Equivariance check: max abs error = " << max_err << ", mean abs error = " << (sum_err/cnt) << "\n";
81     return 0;
82 }
83

This program verifies equivariance numerically. It computes a layer output, then applies a random per-vertex gauge (rotating features and updating edge transport angles) and recomputes. Finally, it rotates the new outputs back by the inverse gauge and compares them to the original outputs. The tiny numerical error demonstrates correct gauge equivariance.

Time: O(E · F) for each forward pass, plus O(V · F + E · L) to apply a gauge (updating features and angles).Space: O(V · F + E) for features and edges; weights are O(\sum_{m} C_{m}^{2}) in this simplified setting.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Rotate a feature vector organized by SO(2) orders m=0..L.
5	// For each order m:
6	// - m==0: C_m scalar channels (unchanged by rotation)
7	// - m>0 : C_m pairs (cos,sin) per channel rotated by angle m*theta
8	// Layout per m is contiguous: [m=0 scalars][m=1 (c0,c1,c0',c1'), ...][m=2 ...]
9
10	struct HarmonicLayout {
11	vector<int> C; // C[m] = number of channels at order m
12	vector<int> off; // starting offset for order m
13	vector<int> size; // total scalar components for order m
14	int total = 0; // total feature dimension
15
16	explicit HarmonicLayout(const vector<int>& C_) : C(C_) {
17	int L = (int)C.size() - 1;
18	off.resize(L+1); size.resize(L+1);
19	int cur = 0;
20	for (int m = 0; m <= L; ++m) {
21	int dm = (m==0 ? 1 : 2);
22	off[m] = cur;
23	size[m] = dm * C[m];
24	cur += size[m];
25	}
26	total = cur;
27	}
28	};
29
30	vector<double> rotate_so2(const vector<double>& feat, double theta, const HarmonicLayout& H) {
31	vector<double> out(H.total);
32	for (int m = 0; m < (int)H.C.size(); ++m) {
33	int C_m = H.C[m];
34	int dm = (m==0 ? 1 : 2);
35	int base = H.off[m];
36	if (m == 0) {
37	for (int c = 0; c < C_m; ++c) out[base + c] = feat[base + c];
38	} else {
39	double ang = m * theta;
40	double cs = cos(ang), sn = sin(ang);
41	for (int c = 0; c < C_m; ++c) {
42	double a = feat[base + 2*c + 0]; // cos component
43	double b = feat[base + 2*c + 1]; // sin component
44	// [a'; b'] = R_m(theta) [a; b]
45	out[base + 2c + 0] = cs a - sn * b;
46	out[base + 2c + 1] = sn a + cs * b;
47	}
48	}
49	}
50	return out;
51	}
52
53	int main() {
54	// Example: orders m=0..2, with C0=1 scalar, C1=2 vector channels, C2=1 second-order channel
55	HarmonicLayout H({1, 2, 1});
56	cout << "Total feature dim = " << H.total << "\n";
57
58	// Build a feature: [m=0: s0] [m=1: (a1,b1),(a2,b2)] [m=2: (c1,d1)]
59	vector<double> f(H.total);
60	// m=0
61	f[H.off[0] + 0] = 3.0;
62	// m=1 (two channels)
63	f[H.off[1] + 0] = 1.0; // a1
64	f[H.off[1] + 1] = 0.5; // b1
65	f[H.off[1] + 2] = -0.3; // a2
66	f[H.off[1] + 3] = 2.0; // b2
67	// m=2 (one channel)
68	f[H.off[2] + 0] = 0.7; // c1
69	f[H.off[2] + 1] = -1.2; // d1
70
71	double theta = 0.4; // radians
72	auto fr = rotate_so2(f, theta, H);
73
74	// Rotate back by -theta; should recover original
75	auto f_rr = rotate_so2(fr, -theta, H);
76
77	cout.setf(std::ios::fixed); cout<<setprecision(6);
78	cout << "Original vs. Rotated-back (should match)\n";
79	double max_err = 0.0;
80	for (int i = 0; i < H.total; ++i) {
81	double err = fabs(f[i] - f_rr[i]);
82	max_err = max(max_err, err);
83	cout << i << ": " << f[i] << " vs " << f_rr[i] << "\n";
84	}
85	cout << "Max error: " << max_err << "\n";
86	return 0;
87	}
88

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct HarmonicLayout { vector<int> C, off, size; int total=0; HarmonicLayout(const vector<int>& C_):C(C_){int L=C.size()-1;off.resize(L+1);size.resize(L+1);int cur=0;for(int m=0;m<=L;++m){int dm=(m==0?1:2);off[m]=cur;size[m]=dm*C[m];cur+=size[m];}total=cur;} };
5
6	vector<double> rotate_so2(const vector<double>& x, double theta, const HarmonicLayout& H){ vector<double> y(H.total); for(int m=0;m<(int)H.C.size();++m){int Cm=H.C[m], base=H.off[m]; if(m==0){for(int c=0;c<Cm;++c) y[base+c]=x[base+c];} else {double a=mtheta, cs=cos(a), sn=sin(a); for(int c=0;c<Cm;++c){double xc=x[base+2c+0], xs=x[base+2c+1]; y[base+2c+0]=csxc - snxs; y[base+2c+1]=snxc + cs*xs;}} } return y; }
7
8	struct Edge { int v; double theta; double w; };
9
10	struct GaugeLayer {
11	HarmonicLayout H; vector<double> W0; vector<double> W1; // simple: same in/out channels and only m=0,1
12	GaugeLayer(const HarmonicLayout& H_) : H(H_) {
13	std::mt19937 rng(123); std::normal_distribution<double> N(0.0, 0.1);
14	W0.resize(H.C[0]*H.C[0]); for(double &z:W0) z=N(rng);
15	W1.resize(H.C[1]*H.C[1]); for(double &z:W1) z=N(rng);
16	}
17	vector<vector<double>> forward(const vector<vector<Edge>>& adj, const vector<vector<double>>& X){
18	int V=adj.size(); vector<vector<double>> Y(V, vector<double>(H.total,0.0));
19	for(int u=0;u<V;++u){
20	// accumulators
21	vector<double> acc0(H.C[0], 0.0); vector<double> acc1(2*H.C[1], 0.0);
22	for(const Edge& e: adj[u]){
23	auto xa = rotate_so2(X[e.v], e.theta, H); // align into u's frame
24	// m=0 mixing: y0_i += w * sum_j W0[i,j] * x0_j
25	for(int i=0;i<H.C[0];++i){ double s=0.0; for(int j=0;j<H.C[0];++j) s += W0[iH.C[0]+j] xa[H.off[0]+j]; acc0[i]+=e.w*s; }
26	// m=1 mixing: apply same weights to cos/sin components
27	for(int i=0;i<H.C[1];++i){ double sc=0.0, ss=0.0; for(int j=0;j<H.C[1];++j){ double w=W1[iH.C[1]+j]; sc += w xa[H.off[1]+2j+0]; ss += w xa[H.off[1]+2j+1]; } acc1[2i+0]+=e.wsc; acc1[2i+1]+=e.w*ss; }
28	}
29	// write out
30	for(int i=0;i<H.C[0];++i) Y[u][H.off[0]+i]=acc0[i];
31	for(int i=0;i<H.C[1];++i){ Y[u][H.off[1]+2i+0]=acc1[2i+0]; Y[u][H.off[1]+2i+1]=acc1[2i+1]; }
32	}
33	return Y;
34	}
35	};
36
37	// Apply a random gauge gamma[u] to features and update edge transports: theta' = theta + gamma(u) - gamma(v)
38	void apply_gauge(const vector<double>& gamma, const HarmonicLayout& H,
39	vector<vector<double>>& X, vector<vector<Edge>>& adj){
40	int V = X.size();
41	for(int u=0; u<V; ++u){ X[u] = rotate_so2(X[u], gamma[u], H); }
42	for(int u=0; u<V; ++u){ for(Edge& e: adj[u]){ e.theta = e.theta + gamma[u] - gamma[e.v]; } }
43	// normalize angles to (-pi, pi]
44	auto wrap = [](double a){ while(a<=-M_PI) a+=2M_PI; while(a>M_PI) a-=2M_PI; return a; };
45	for(int u=0; u<V; ++u){ for(Edge& e: adj[u]) e.theta = wrap(e.theta); }
46	}
47
48	int main(){
49	// Graph: 5 nodes in a ring to include nontrivial holonomy
50	int V=5; vector<vector<Edge>> adj(V);
51	auto add_bidir = [&](int u,int v,double th){ adj[u].push_back({v, th, 1.0}); adj[v].push_back({u, -th, 1.0}); };
52	add_bidir(0,1, 0.20); add_bidir(1,2, -0.10); add_bidir(2,3, 0.25); add_bidir(3,4, -0.15); add_bidir(4,0, 0.05);
53
54	// Orders m=0:1 scalar, m=1:2 vector channels (dim=1+2*2=5)
55	HarmonicLayout H({1,2});
56
57	// Random features
58	std::mt19937 rng(7); std::uniform_real_distribution<double> U(-1.0,1.0);
59	vector<vector<double>> X(V, vector<double>(H.total));
60	for(int u=0; u<V; ++u) for(int i=0;i<H.total;++i) X[u][i]=U(rng);
61
62	GaugeLayer layer(H);
63	auto Y1 = layer.forward(adj, X); // baseline output
64
65	// Draw random per-vertex gauge angles
66	vector<double> gamma(V); for(double &g:gamma) g = U(rng);
67
68	// Copy X and adj, apply gauge
69	auto Xg = X; auto adjg = adj; apply_gauge(gamma, H, Xg, adjg);
70
71	// Forward under new gauge
72	auto Y2 = layer.forward(adjg, Xg);
73
74	// Rotate Y2 back by -gamma[u] for comparison with Y1
75	double max_err = 0.0; double sum_err = 0.0; int cnt=0;
76	for(int u=0; u<V; ++u){ auto Y2b = rotate_so2(Y2[u], -gamma[u], H);
77	for(int i=0;i<H.total;++i){ double err=fabs(Y1[u][i]-Y2b[i]); max_err=max(max_err,err); sum_err+=err; ++cnt; }
78	}
79	cout.setf(std::ios::fixed); cout<<setprecision(6);
80	cout << "Equivariance check: max abs error = " << max_err << ", mean abs error = " << (sum_err/cnt) << "\n";
81	return 0;
82	}
83