Graph Isomorphism & WL Test
Key Points
- â˘Graph isomorphism asks whether two graphs are the same up to renaming vertices; the WeisfeilerâLeman (WL) test is a powerful heuristic that often distinguishes non-isomorphic graphs quickly.
- â˘The 1-dimensional WL test (color refinement) repeatedly recolors each vertex using its current color and the multiset of its neighborsâ colors until stabilization.
- â˘If two graphs have different stabilized color histograms under the same WL process, they are provably non-isomorphic; if they match, graphs are WL-equivalent but not guaranteed isomorphic.
- â˘WL can be implemented efficiently in C++ by sorting neighbor color lists or by hashing color multisets, with practical time near O(m log n) per iteration.
- â˘The test extends to k-WL, which colors k-tuples of vertices and is strictly stronger but more expensive; 2-WL already handles many tricky cases that 1-WL misses.
- â˘WL color refinement also produces permutation-invariant graph signatures useful for indexing, hashing, and machine learning kernels.
- â˘Common pitfalls include mixing independent color id spaces across graphs, relying on 64-bit hashes without collision control, and stopping too early before color stabilization.
- â˘WL is best used as a fast pre-filter before exact isomorphism algorithms (e.g., individualizationârefinement or canonical labeling) in practical toolchains.
Prerequisites
- âGraphs and adjacency representations â WL operates on graph neighborhoods; understanding adjacency lists/matrices is essential.
- âBig-O complexity â To analyze per-iteration and total costs of WL refinement.
- âSorting and multisets â WL signatures encode neighbor color multisets using sorted lists or counts.
- âHashing and collision handling â Efficient implementations may hash signatures; awareness of collisions prevents false matches.
- âEquivalence relations and partitions â WL maintains and refines a partition of the vertex set; equitable partitions are key.
- âPermutations and isomorphisms â Graph isomorphism involves vertex permutations that preserve adjacency.
- âUndirected vs directed/labeled graphs â Signature construction must adapt to direction and labels appropriately.
- âMap and set usage in C++ â Implementing deterministic signature compression relies on ordered maps or careful hashing.
Detailed Explanation
Tap terms for definitions01Overview
Graph isomorphism is the problem of deciding whether two graphs are identical up to a relabeling of vertices. Brute-force checking is impractical because there are n! possible relabelings. The WeisfeilerâLeman (WL) test is a family of combinatorial procedures that compares graphs by iteratively refining a coloring (partition) of vertices. The simplest version, 1-WL (also called color refinement), starts by assigning colors to vertices (often by degree) and then repeatedly updates each vertexâs color based on its current color and the multiset of its neighborsâ colors. The process stabilizes when no further refinement (splitting of color classes) happens. Two graphs that stabilize to different color histograms cannot be isomorphic. WL is not a complete isomorphism test: there exist non-isomorphic graphs that are indistinguishable by 1-WL (and even by k-WL for fixed k). Nevertheless, 1-WL is extremely effective in practice on many real-world graphs, is fast to compute, and forms the backbone of more advanced methods. The method generalizes to k-WL, which refines colors on k-tuples of vertices and is strictly more powerful but has higher computational cost. The WL procedure also yields permutation-invariant graph fingerprints (hashes) and features used in graph kernels and graph neural networks.
02Intuition & Analogies
Think of each vertex as a person wearing a colored shirt. Initially, everyone might wear the same color, or a color based on an easy feature like their number of friends (degree). At each round, each person looks at the shirt colors of their friends, collects them into a bag (order doesnât matter), and then changes their own shirt to a new color that summarizes two things: their previous color and the multiset of their friendsâ colors. If two people have identical bags and the same previous color, they get the same new shirt color; otherwise, their colors may split into finer distinctions. We repeat this wardrobe update for everyone simultaneously. Over time, people with indistinguishable social surroundings end up wearing the same color, while structurally different people separate into different colors. Now imagine running this process in parallel on two different social networks. If at some point the color distributions donât match (say, one network has five people in green and the other has six), we can conclude the networks canât be the same under any renaming: their ârole countsâ differ. If the distributions keep matching all the way until no oneâs shirt changes anymore, the networks look the same from the perspective of this color-refinement process. That doesnât prove they are truly the same network (some clever constructions can fool the process), but itâs a strong and fast check that often suffices. This intuition generalizes: instead of looking at individual people (1-WL), you can look at pairs, triples, or k-tuples of people (k-WL), updating their tuple-colors using the colors of closely related tuples. This sees deeper structural patterns but costs more time and memory.
03Formal Definition
04When to Use
- As a fast pre-filter for graph isomorphism: run 1-WL to quickly reject many non-isomorphic pairs before invoking heavier exact algorithms (e.g., individualizationârefinement or canonical labeling tools like nauty/Traces/Bliss).
- For graph indexing and deduplication at scale: compute a permutation-invariant WL-based hash to cluster likely-isomorphic graphs, then verify candidates exactly.
- As features for machine learning: WL color histograms or subtree-pattern features (WeisfeilerâLehman kernel) are effective for classification and regression on graph-structured data.
- For symmetry detection: stabilized colors approximate vertex orbits; vertices with different WL colors cannot be in the same automorphism orbit, helping prune search.
- When the graphs are large and sparse and exact GI would be expensive: 1-WL runs near-linearly per round and often separates practical instances quickly.
- Avoid 1-WL alone when dealing with known counterexamples (e.g., certain strongly regular graphs or CFI constructions) or when you need a proof of isomorphism/non-isomorphism; combine with stronger methods (2-WL, IR search).
â ď¸Common Mistakes
- Mixing color id spaces across graphs: running refinement independently on each graph and then comparing color numbers is meaningless. Always refine both graphs with a shared, deterministic encoding of signatures (global compression) when comparing.
- Over-trusting hashes: using 64-bit hashes for signatures can create accidental collisions, potentially declaring WL-equivalence when signatures differ. Prefer exact canonical compression via ordered containers (e.g., std::map over structured signatures), or use wide hashes and collision checks.
- Stopping too early: checking only that the number of colors didnât change once may miss a final idempotent pass. A safe criterion is that the number of colors doesnât increase and the induced partition is unchanged (or simply run one extra pass after a non-increase).
- Ignoring degree initialization: starting with all-equal initial colors works but takes more iterations. Degree-based initialization dramatically accelerates convergence and detects obvious mismatches immediately.
- Forgetting permutation invariance in graph fingerprints: per-vertex signatures must be aggregated in an order-insensitive way (e.g., sort them) before hashing; otherwise, different labelings produce different hashes.
- Mishandling directed, labeled, or multigraph variants: the signature must incorporate edge directions, labels, or multiplicities consistently, or you risk incorrect conclusions.
- Performance pitfalls: repeatedly sorting long neighbor color lists naively can be slow; use counting over color ids or stable integer encoding to reduce overhead. Avoid O(n^2) operations on dense graphs where O(m log n) approaches suffice.
Key Formulas
Graph Isomorphism Definition
Explanation: Two graphs are isomorphic if there exists a bijection between their vertices that preserves edges in both directions. This formalizes the idea of relabeling vertices without changing structure.
1-WL Update Rule
Explanation: Each vertexâs new color depends on its previous color and the multiset of neighbor colors. The compress operator maps identical signatures to the same integer id consistently.
Neighborhood Multiset
Explanation: The multiset collects neighbor colors with counts, ignoring order. It captures the local color distribution around a vertex.
WL Stabilization Criterion
Explanation: WL stops when no further splitting of color classes occurs between iterations. Counting colors is a quick proxy; an extra pass ensures idempotence.
Equitable Partition Property
Explanation: Within each color class, all vertices have the same number of neighbors in any other class. WL reaches the coarsest equitable partition.
k-WL Tuple Update
Explanation: For k-WL, tuples are recolored using colors of tuples formed by replacing one component. This strictly increases distinguishing power with k.
Handshake Lemma
Explanation: The sum of degrees in an undirected graph equals twice the number of edges m. This helps bound per-iteration work in WL by total degree.
1-WL Time Complexity (typical)
Explanation: With sorting-based multiset encoding, each iteration costs about m log n to sort neighbor colors plus n log n to compress signatures. I is the number of iterations, at most n but typically much smaller.
Iteration Bound
Explanation: Each strict refinement increases the number of color classes by at least one, which cannot exceed n. Therefore, the number of iterations is bounded by nâ1.
Permutation-Invariant WL Hash
Explanation: A graph-level fingerprint can be built by sorting per-vertex WL signatures and hashing the sequence. Sorting removes dependence on vertex order.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Graph { 5 int n; // number of vertices 6 vector<vector<int>> adj; // adjacency list (0-indexed, undirected) 7 Graph(int n_=0): n(n_), adj(n_) {} 8 void add_edge(int u, int v) { 9 adj[u].push_back(v); 10 adj[v].push_back(u); 11 } 12 }; 13 14 // Perform 1-WL color refinement until stabilization. 15 // Returns a vector of stable color ids (0..k-1) for vertices 0..n-1. 16 vector<int> wl_refinement(const Graph &G) { 17 int n = G.n; 18 vector<int> color(n); 19 // Degree-based initialization (faster convergence than uniform) 20 for (int v = 0; v < n; ++v) color[v] = (int)G.adj[v].size(); 21 22 // Compress initial degrees to consecutive ids to keep color ids small 23 vector<int> uniq = color; 24 sort(uniq.begin(), uniq.end()); 25 uniq.erase(unique(uniq.begin(), uniq.end()), uniq.end()); 26 for (int v = 0; v < n; ++v) { 27 color[v] = (int)(lower_bound(uniq.begin(), uniq.end(), color[v]) - uniq.begin()); 28 } 29 30 int num_colors = (int)uniq.size(); 31 32 // Iteratively refine 33 for (int iter = 0; iter < n; ++iter) { // safe upper bound on iterations 34 // Build signatures: (old_color, sorted multiset of neighbor colors) 35 vector<pair<int, vector<int>>> sig(n); 36 for (int v = 0; v < n; ++v) { 37 vector<int> neigh; 38 neigh.reserve(G.adj[v].size()); 39 for (int u : G.adj[v]) neigh.push_back(color[u]); 40 sort(neigh.begin(), neigh.end()); // multiset encoding 41 sig[v] = { color[v], move(neigh) }; 42 } 43 44 // Compress signatures to new color ids deterministically 45 map<pair<int, vector<int>>, int> encoder; // ordered to avoid hash collisions 46 vector<int> new_color(n); 47 int next_id = 0; 48 for (int v = 0; v < n; ++v) { 49 auto it = encoder.find(sig[v]); 50 if (it == encoder.end()) { 51 encoder[sig[v]] = next_id; 52 new_color[v] = next_id++; 53 } else { 54 new_color[v] = it->second; 55 } 56 } 57 58 int new_num_colors = next_id; 59 // If no increase in number of colors, do one final pass check and stop 60 color.swap(new_color); 61 if (new_num_colors == num_colors) { 62 // One more pass would yield the same partition; stop. 63 break; 64 } 65 num_colors = new_num_colors; 66 } 67 return color; 68 } 69 70 int main() { 71 ios::sync_with_stdio(false); 72 cin.tie(nullptr); 73 74 // Example: a 6-cycle C6 75 Graph G(6); 76 for (int i = 0; i < 6; ++i) G.add_edge(i, (i+1)%6); 77 78 vector<int> colors = wl_refinement(G); 79 80 // Print stable color of each vertex 81 cout << "Stable colors (vertex:color):\n"; 82 for (int v = 0; v < G.n; ++v) { 83 cout << v << ":" << colors[v] << "\n"; 84 } 85 return 0; 86 } 87
This program implements 1-WL color refinement. It initializes colors by degree, then repeatedly replaces each vertexâs color with a compressed id derived from its old color and the sorted multiset of neighbor colors. It stops when the number of colors no longer increases (the partition is equitable). The example runs on a 6-cycle, where all vertices remain equivalent under WL and share the same final color.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Graph { 5 int n; vector<vector<int>> adj; 6 Graph(int n_=0): n(n_), adj(n_) {} 7 void add_edge(int u,int v){ adj[u].push_back(v); adj[v].push_back(u); } 8 }; 9 10 // Compress a vector of integers to consecutive ids in [0,k) 11 static inline void compress_ids(vector<int>& a){ 12 vector<int> b = a; sort(b.begin(), b.end()); b.erase(unique(b.begin(), b.end()), b.end()); 13 for (int &x : a) x = (int)(lower_bound(b.begin(), b.end(), x) - b.begin()); 14 } 15 16 // Return true iff G and H are indistinguishable by 1-WL (WL-equivalent). 17 // If it returns false, they are CERTAINLY non-isomorphic. If true, they may or may not be isomorphic. 18 bool wl_equivalent_1d(const Graph& G, const Graph& H){ 19 if (G.n != H.n) return false; 20 int n = G.n; 21 auto edge_count = [](const Graph& X){ long long m=0; for(auto& v: X.adj) m+=v.size(); return m/2; }; 22 if (edge_count(G) != edge_count(H)) return false; // quick reject 23 24 // Joint initialization by degree, then compress jointly to align color ids across graphs 25 vector<int> colG(n), colH(n); 26 for (int v=0; v<n; ++v){ colG[v] = (int)G.adj[v].size(); colH[v] = (int)H.adj[v].size(); } 27 // Build a joint list to compress to common ids 28 vector<int> all; all.reserve(2*n); 29 all.insert(all.end(), colG.begin(), colG.end()); 30 all.insert(all.end(), colH.begin(), colH.end()); 31 compress_ids(all); 32 for (int v=0; v<n; ++v){ colG[v] = all[v]; colH[v] = all[n+v]; } 33 34 int num_colors = -1; 35 for (int iter = 0; iter < n; ++iter){ 36 // Build signatures for both graphs 37 vector<pair<int, vector<int>>> sigG(n), sigH(n); 38 for (int v=0; v<n; ++v){ 39 vector<int> ngh; 40 ngh.reserve(G.adj[v].size()); 41 for (int u: G.adj[v]) ngh.push_back(colG[u]); 42 sort(ngh.begin(), ngh.end()); 43 sigG[v] = { colG[v], move(ngh) }; 44 } 45 for (int v=0; v<n; ++v){ 46 vector<int> ngh; 47 ngh.reserve(H.adj[v].size()); 48 for (int u: H.adj[v]) ngh.push_back(colH[u]); 49 sort(ngh.begin(), ngh.end()); 50 sigH[v] = { colH[v], move(ngh) }; 51 } 52 // Jointly compress signatures to common ids so color names match across graphs 53 map<pair<int, vector<int>>, int> enc; 54 int next_id = 0; 55 vector<int> ncolG(n), ncolH(n); 56 for (int v=0; v<n; ++v){ auto it=enc.find(sigG[v]); if (it==enc.end()) it = enc.emplace(sigG[v], next_id++).first; ncolG[v] = it->second; } 57 for (int v=0; v<n; ++v){ auto it=enc.find(sigH[v]); if (it==enc.end()) it = enc.emplace(sigH[v], next_id++).first; ncolH[v] = it->second; } 58 59 // Compare color histograms; if different, graphs are non-isomorphic 60 vector<int> cntG(next_id), cntH(next_id); 61 for (int v=0; v<n; ++v){ cntG[ncolG[v]]++; cntH[ncolH[v]]++; } 62 if (cntG != cntH) return false; 63 64 // Check stabilization: number of colors did not increase and colors unchanged under joint encoding 65 if (next_id == num_colors && ncolG == colG && ncolH == colH) return true; // stabilized and equal 66 67 colG.swap(ncolG); colH.swap(ncolH); 68 num_colors = next_id; 69 } 70 // After n iterations, if never distinguished, treat as WL-equivalent 71 return true; 72 } 73 74 int main(){ 75 ios::sync_with_stdio(false); 76 cin.tie(nullptr); 77 78 // Example: Two 6-cycles should be WL-equivalent (and actually isomorphic) 79 Graph A(6), B(6); 80 for (int i=0;i<6;++i){ A.add_edge(i,(i+1)%6); B.add_edge(i,(i+1)%6); } 81 82 cout << (wl_equivalent_1d(A,B) ? "WL-equivalent\n" : "Distinguished by WL\n"); 83 84 // Modify B to break a symmetry: add a chord 85 B.add_edge(0,3); 86 cout << (wl_equivalent_1d(A,B) ? "WL-equivalent\n" : "Distinguished by WL\n"); 87 return 0; 88 } 89
This program compares two graphs using 1-WL with a shared color space. It jointly compresses initial degrees, builds joint signature encodings at each iteration, and compares color histograms. If the histograms differ, the graphs are certainly non-isomorphic. If the process stabilizes without differences, it returns true (WL-equivalent). The example first compares two identical cycles (WL-equivalent) and then adds a chord to distinguish them.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Graph { 5 int n; vector<vector<int>> adj; 6 Graph(int n_=0): n(n_), adj(n_) {} 7 void add_edge(int u,int v){ adj[u].push_back(v); adj[v].push_back(u); } 8 }; 9 10 // SplitMix64: fast 64-bit mixer for hashing integers deterministically 11 static inline uint64_t splitmix64(uint64_t x){ 12 x += 0x9e3779b97f4a7c15ULL; 13 x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL; 14 x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL; 15 return x ^ (x >> 31); 16 } 17 18 vector<int> wl_refinement(const Graph &G){ 19 int n = G.n; vector<int> color(n); 20 for (int v=0; v<n; ++v) color[v] = (int)G.adj[v].size(); 21 vector<int> uniq = color; sort(uniq.begin(), uniq.end()); uniq.erase(unique(uniq.begin(), uniq.end()), uniq.end()); 22 for (int v=0; v<n; ++v) color[v] = (int)(lower_bound(uniq.begin(), uniq.end(), color[v]) - uniq.begin()); 23 int num_colors = (int)uniq.size(); 24 for (int iter=0; iter<n; ++iter){ 25 vector<pair<int, vector<int>>> sig(n); 26 for (int v=0; v<n; ++v){ 27 vector<int> neigh; neigh.reserve(G.adj[v].size()); 28 for (int u: G.adj[v]) neigh.push_back(color[u]); 29 sort(neigh.begin(), neigh.end()); 30 sig[v] = { color[v], move(neigh) }; 31 } 32 map<pair<int, vector<int>>, int> enc; int next_id=0; vector<int> ncol(n); 33 for (int v=0; v<n; ++v){ auto it=enc.find(sig[v]); if (it==enc.end()) it=enc.emplace(sig[v], next_id++).first; ncol[v] = it->second; } 34 if (next_id == num_colors) { color.swap(ncol); break; } 35 color.swap(ncol); num_colors = next_id; 36 } 37 return color; 38 } 39 40 // Build a permutation-invariant 64-bit hash using stabilized WL colors and per-vertex signatures 41 uint64_t wl_graph_hash(const Graph &G){ 42 vector<int> c = wl_refinement(G); 43 int n = G.n; 44 // Build per-vertex canonical signatures (color, sorted neighbor colors-by-color) and then sort vertices by their signatures 45 vector<vector<uint64_t>> vertex_sigs(n); 46 for (int v=0; v<n; ++v){ 47 vector<int> ngh; ngh.reserve(G.adj[v].size()); 48 for (int u: G.adj[v]) ngh.push_back(c[u]); 49 sort(ngh.begin(), ngh.end()); 50 // Encode signature into a sequence of 64-bit chunks to avoid string costs 51 vector<uint64_t> seq; seq.reserve(2 + ngh.size()); 52 seq.push_back((uint64_t) c[v]); 53 seq.push_back(0xFFFFFFFFFFFFFFFFULL); // separator 54 for (int x: ngh) seq.push_back((uint64_t) x); 55 vertex_sigs[v] = move(seq); 56 } 57 // Sort vertex signatures lexicographically to remove permutation dependence 58 sort(vertex_sigs.begin(), vertex_sigs.end()); 59 60 // Fold into a single 64-bit hash deterministically 61 uint64_t h = 0x1234567890abcdefULL; 62 for (auto &seq : vertex_sigs){ 63 uint64_t local = 0x9ddfea08eb382d69ULL; 64 for (uint64_t x : seq){ local = splitmix64(local ^ splitmix64(x + 0x9e3779b97f4a7c15ULL)); } 65 h = splitmix64(h ^ local); 66 } 67 // Also mix in n and m to distinguish trivial cases 68 size_t m2 = 0; for (auto &v: G.adj) m2 += v.size(); m2 /= 2; 69 h = splitmix64(h ^ splitmix64((uint64_t)n) ^ splitmix64((uint64_t)m2)); 70 return h; 71 } 72 73 int main(){ 74 ios::sync_with_stdio(false); 75 cin.tie(nullptr); 76 77 Graph G(4), H(4); 78 // G: path on 4 vertices 0-1-2-3 79 G.add_edge(0,1); G.add_edge(1,2); G.add_edge(2,3); 80 // H: another labeling of the same path 0-2-1-3 81 H.add_edge(0,2); H.add_edge(2,1); H.add_edge(1,3); 82 83 uint64_t hg = wl_graph_hash(G); 84 uint64_t hh = wl_graph_hash(H); 85 cout << hex << hg << "\n" << hh << "\n"; 86 cout << (hg==hh ? "Hashes match (likely isomorphic)\n" : "Hashes differ (non-isomorphic)\n"); 87 return 0; 88 } 89
This program computes a permutation-invariant 64-bit WL fingerprint: run 1-WL to stabilization, form per-vertex signatures from the final colors and sorted neighbor colors, sort these signatures, and fold them with a strong integer mixer. Isomorphic graphs tend to produce identical hashes; non-isomorphic graphs often differ. While collisions are unlikely with SplitMix64 and sorted signatures, a hash match is not a proof of isomorphism.