🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathAdvanced

Persistent Homology

Key Points

  • •
    Persistent homology tracks how topological features (components, loops, voids) appear and disappear as you grow a scale parameter over a filtered simplicial complex.
  • •
    A filtration is an increasing sequence of complexes where every simplex enters at a specific scale but only after all its faces.
  • •
    The core computation reduces a boundary matrix over a field (often \(Z2​\)) to obtain birth–death pairs and barcodes/diagrams.
  • •
    Vietoris–Rips filtrations are built from point clouds by inserting simplices when all pairwise distances are below a threshold, and the filtration value is the maximum edge length.
  • •
    Zero columns in the reduced boundary matrix create features; nonzero columns kill previously created features, forming birth–death pairs.
  • •
    Long bars often indicate meaningful shape while short bars are more likely noise, and stability theorems quantify this robustness.
  • •
    Computing only \(H0​\) (connected components) is very fast with union–find and is equivalent to Kruskal’s MST process.
  • •
    Full persistent homology can be cubic in the number of simplices, so careful filtration design and low-dimensional truncation are crucial.

Prerequisites

  • →Basic linear algebra — Understanding vector spaces, matrices, kernels, images, and rank is needed for chain groups and boundary matrix reduction.
  • →Graph theory and combinatorics — Simplicial complexes generalize graphs; counting and constructing simplices relies on combinatorial reasoning.
  • →Point-set topology (introductory) — Concepts like connectedness and loops clarify the meaning of H0 and H1 features.
  • →Algorithms and data structures — Efficient implementations need sorting, hash maps, union–find, and sparse representations.
  • →Metric spaces and distances — Rips filtrations depend on pairwise distances; properties like triangle inequality matter.
  • →Computational geometry (basics) — Alternative filtrations (alpha complexes) use Delaunay/Voronoi structures and geometric primitives.

Detailed Explanation

Tap terms for definitions

01Overview

Persistent homology is a method from topological data analysis (TDA) that summarizes the shape of data across scales. Given data (for example, a point cloud), we do not fix a single scale; instead, we build a family of shapes—called a filtration—by steadily relaxing a scale parameter. As the scale grows, new connections form, loops appear, and voids are filled. Persistent homology records when each feature is born and when it dies, producing a multiscale signature of the data such as a barcode (intervals) or a persistence diagram (points in the plane). Practically, we first turn data into a simplicial complex (e.g., Vietoris–Rips or alpha complex), assign each simplex a filtration value that respects the face condition (faces appear no later than the simplex), sort simplices by this order, build a boundary matrix over a field, and perform a matrix reduction. The pivot structure of the reduced matrix reveals birth–death pairs across homology dimensions ((H_0) for components, (H_1) for loops, (H_2) for voids, etc.). The result is robust: small perturbations of input cause small changes in the diagram. Persistent homology has become a standard tool for understanding structure in high-dimensional and noisy data.

02Intuition & Analogies

Imagine slowly flooding a landscape. At first, isolated puddles form in the lowest areas—these are connected components ((H_0)). As the water level rises, puddles merge, and some components die—the moment two pools connect, one of their identities is lost. Occasionally, water can surround a hill, creating a loop ((H_1)); later, as the level keeps rising, that loop is filled in and dies. In higher dimensions, the same story repeats with voids ((H_2)), like air pockets in a sponge, which appear and later disappear. Persistent homology is the logbook of this flood, noting the water levels (scales) at which each feature appears (birth) and vanishes (death). For point clouds, replace the water with growing balls around each point. Where balls touch, edges appear; where triangles are supported by mutually touching triplets, loops form; further connections can fill and then kill those loops. If a loop persists over a large range of radii, it’s like a ring-shaped canyon—hard to destroy—so it shows as a long bar and likely reflects real structure. Short bars often correspond to small, local irregularities—ripples in the terrain—more likely caused by noise. The beauty is that we don’t have to pick a single radius (which can be arbitrary and fragile); instead, we read structure across all radii and trust the features that endure.

03Formal Definition

Let \(K\) be a finite simplicial complex with a filtration function \(f: K → R\) satisfying the face condition: if \(τ\) is a face of \(σ\), then \(f(τ) ≤ f(σ)\). The filtration is the nested sequence \(Ft​ = \{σ ∈ K : f(σ) ≤ t\}\), so that \(Fs​ ⊆ Ft​\) for \(s ≤ t\). For each dimension \(k\) and each \(t\), define the chain group \(Ck​(Ft​; F)\) over a field \(F\) with boundary maps \(∂k​ : Ck​ → Ck−1​\) satisfying \(∂k​ ∘ ∂k+1​ = 0\). The \(k\)-th homology at time \(t\) is \(Hk​(Ft​; F) = ker ∂k​ / im ∂k+1​\), with Betti number \(βk​(t) = dim Hk​(Ft​; F)\). Inclusions \(Fs​ ↪ Ft​\) induce linear maps \(φst​ : Hk​(Fs​) → Hk​(Ft​)\), forming a persistence module. Under finiteness (tameness) assumptions, this module decomposes as a direct sum of interval modules, yielding a multiset of birth–death pairs \((b, d)\) (with \(b < d\) or \(d=∞\)). Algorithmically, we sort simplices by increasing \(f\) (breaking ties by dimension and lexicographic order), build the boundary matrix over \(F\), and apply column reduction. Columns that reduce to zero create \(k\)-dimensional classes at their filtration values; columns with a nonzero pivot kill a class born at the pivot’s row, producing a pair in \(Hk−1​\). The collection of these intervals is the barcode or, equivalently, the persistence diagram.

04When to Use

Use persistent homology when you need scale-robust shape descriptors of data. Typical scenarios include: (1) point clouds in (\mathbb{R}^d), where you suspect clusters, loops, or voids (e.g., sensor trajectories, molecular conformations); (2) images or voxel data, where cubical complexes capture tunnels and cavities; (3) graphs and networks, where (H_0) persistence provides a principled multiscale clustering (via Kruskal/MST) and higher-dimensional features can be studied on clique complexes; (4) time series, via sliding-window embeddings that turn dynamics into loops/torii; (5) model comparison, where persistence diagrams allow quantitative comparison through distances such as the bottleneck distance. Choose Vietoris–Rips for generic metric data (simple to build, but combinatorially heavy), alpha complexes for Euclidean point clouds (geometrically faithful and sparser), or cubical complexes for gridded data. Compute only as high a dimension as needed (often (H_0)–(H_1)), and restrict the filtration range to control complexity. Persistent homology is especially valuable when you need invariance to isometries and robustness to noise and when conventional clustering or dimensionality reduction misses global structure like loops.

⚠️Common Mistakes

• Violating the face condition: inserting a simplex before its faces leads to invalid filtrations and wrong pairings. Always ensure every (k)-simplex enters no earlier than all its ((k-1))-faces. • Mishandling ties: when multiple simplices share the same filtration value, inconsistent ordering can change pairings. Break ties deterministically—e.g., by increasing dimension, then lexicographic vertex order. • Ignoring coefficient fields: working over (\mathbb{Z}2) simplifies signs; using other fields changes results (especially torsion). State and fix the field upfront. • Overbuilding Rips complexes: they blow up combinatorially; computing up to dimension 1 or 2 and capping the radius often suffices. • Misinterpreting infinite bars: long (H_0) bar corresponds to the final connected component (the whole space), not a special feature. • Numerical instability: distances that are nearly equal can cause large tie sets; snap with a tolerance and use a stable sort. • Memory blowups: dense boundary matrices are huge; prefer sparse column representations and avoid constructing unnecessary higher-dimensional simplices. • Conflating homology dimensions: births in column-reduction happen at the column’s own dimension; deaths recorded by a column of dimension (k) correspond to features in (H{k-1}). Keep this dimension shift clear.

Key Formulas

Face Condition for Filtrations

f(τ)≤f(σ)wheneverτ⊂σ

Explanation: A simplex can only appear after all of its faces. This guarantees each filtration level is a valid simplicial complex and enables correct persistence pairing.

Vietoris–Rips Complex

VRϵ​(X)={σ={xi0​​,…,xik​​}:p<qmax​d(xip​​,xiq​​)≤ϵ}

Explanation: A simplex is included if all its edges are short. In a filtration, one sets the filtration value of \(σ\) to the maximum edge length inside it.

Chain Groups and Boundary Maps

Ck​(K;F)=Fmk​,∂k​:Ck​→Ck−1​,∂k​∘∂k+1​=0

Explanation: Chains are vectors over simplices; boundary maps send k-chains to (k−1)-chains and compose to zero. This structure defines cycles and boundaries.

Homology at Scale t

Hk​(Ft​;F)=ker∂k​/im∂k+1​

Explanation: Homology groups capture k-dimensional holes present at filtration level t by modding out boundaries from cycles.

Betti Numbers

βk​(t)=dimHk​(Ft​;F)

Explanation: The number of independent k-dimensional features at time t. These counts can change only when a simplex enters the filtration.

Persistence Module Maps

φst​:Hk​(Fs​)→Hk​(Ft​),s≤t

Explanation: Inclusion-induced maps track how homology classes evolve across scales. Their structure decomposes into interval modules, giving barcodes.

Stability Theorem (Bottleneck Distance)

dB​(Df​,Dg​)≤∥f−g∥∞​

Explanation: Small perturbations of the filtration function move the persistence diagram by at most the perturbation size. This justifies robustness of persistent homology.

Combinatorial Growth

Rips simplices up to dimension k:i=0∑k​(i+1n​)

Explanation: The number of simplices grows rapidly with n and k. This limits practical computations to low dimensions and moderate n.

Boundary Matrix Reduction Cost

Treduction​=O(m3)in the worst case

Explanation: Reducing an m-by-m boundary matrix densely can be cubic. Sparse methods and heuristics often make it much faster in practice.

H0 via Union–Find

TH0​=O(Eα(V))

Explanation: Zero-dimensional persistence using Kruskal’s process is near-linear in the number of edges, where \(α\) is the inverse Ackermann function.

Complexity Analysis

Let n be the number of input points, k the maximum homology dimension computed, and m the total number of simplices inserted into the filtration. For a Vietoris–Rips filtration up to dimension k, the number of simplices can be as large as \(∑i=0k​ (i+1n​)\), which is polynomial of degree k+1 in n and rapidly becomes prohibitive. Constructing all edges requires O(n2) distance computations and sorting O(n2 log n) if you explicitly sort edges by length. Building higher-dimensional simplices typically costs O(nk+1) checks; using adjacency lists and pruning by thresholds can reduce constants but not the worst-case growth. The standard persistence algorithm forms a boundary matrix with at most m columns and m rows (often block-structured by dimension). In the naive dense model, column reduction over a field has worst-case time O(m3) and space O(m2). Practical implementations use sparse columns (storing only nonzero row indices) and hash maps from pivots to columns, giving dramatically better performance on typical filtrations, but the worst case remains cubic. Memory usage becomes the main bottleneck well before time does; using compressed sparse columns and truncating the filtration (e.g., max radius) or dimension (e.g., up to 1 or 2) is crucial. For \(H0​\), there is a near-linear-time algorithm using union–find. Given E edges (or all \((2n​)\) pairs), sorting edges by weight is O(E log E), and each union/find operation is essentially constant amortized, for total O(E α(n)) after sorting. The \(H0​\) computation is equivalent to Kruskal’s MST process and scales to very large datasets. In summary: full persistence is expensive in m; design filtrations to keep m small, compute only needed dimensions, and use sparse reductions.

Code Examples

Persistent Homology up to H1 via Boundary Matrix on a Vietoris–Rips Filtration (\u2124_2 coefficients)
1#include <bits/stdc++.h>
2using namespace std;
3
4// This educational example builds a Vietoris–Rips filtration up to dimension 2
5// from a small 2D point cloud, reduces the boundary matrix over Z2, and prints
6// persistence intervals for H0 and H1. It is NOT optimized and intended for small inputs.
7
8struct Simplex {
9 vector<int> verts; // sorted vertex indices
10 int dim; // dimension = verts.size()-1
11 double filt; // filtration value (for Rips: max edge length in the simplex)
12};
13
14// Key encoding for a simplex's vertex set (sorted) to map to its index
15static string keyOf(const vector<int>& v){
16 string s; s.reserve(v.size()*3);
17 for(size_t i=0;i<v.size();++i){
18 if(i) s.push_back(',');
19 s += to_string(v[i]);
20 }
21 return s;
22}
23
24// Symmetric difference (XOR) of two sorted vectors of ints
25static vector<int> symdiff(const vector<int>& a, const vector<int>& b){
26 vector<int> out; out.reserve(a.size()+b.size());
27 size_t i=0,j=0;
28 while(i<a.size() || j<b.size()){
29 if(j==b.size() || (i<a.size() && a[i]<b[j])){ out.push_back(a[i++]); }
30 else if(i==a.size() || b[j]<a[i]){ out.push_back(b[j++]); }
31 else { // equal -> cancel in Z2
32 ++i; ++j;
33 }
34 }
35 return out;
36}
37
38// Compute Euclidean distances
39static vector<vector<double>> pairwiseDistances(const vector<pair<double,double>>& pts){
40 int n = (int)pts.size();
41 vector<vector<double>> D(n, vector<double>(n, 0.0));
42 for(int i=0;i<n;i++){
43 for(int j=i+1;j<n;j++){
44 double dx = pts[i].first - pts[j].first;
45 double dy = pts[i].second - pts[j].second;
46 double d = sqrt(dx*dx + dy*dy);
47 D[i][j]=D[j][i]=d;
48 }
49 }
50 return D;
51}
52
53int main(){
54 // Example point cloud: noisy circle-like points
55 vector<pair<double,double>> P = {
56 {1.0,0.0},{0.707,0.707},{0.0,1.0},{-0.707,0.707},{-1.0,0.0},
57 {-0.707,-0.707},{0.0,-1.0},{0.707,-0.707},{1.2,0.1},{-1.1,-0.1}
58 };
59 int n = (int)P.size();
60 auto D = pairwiseDistances(P);
61
62 // Build Vietoris–Rips filtration up to dim=2 with a radius cap to keep size small
63 double maxR = 1.1; // cap (adjust to control size). Triangles/edges with filt > maxR are skipped.
64
65 vector<Simplex> simplices;
66 simplices.reserve(n + n*(n-1)/2);
67
68 // 0-simplices (vertices) with filtration 0
69 for(int i=0;i<n;i++){
70 simplices.push_back({{i}, 0, 0.0});
71 }
72
73 // 1-simplices (edges)
74 for(int i=0;i<n;i++){
75 for(int j=i+1;j<n;j++){
76 double w = D[i][j];
77 if(w <= maxR){
78 simplices.push_back({{i,j}, 1, w});
79 }
80 }
81 }
82
83 // 2-simplices (triangles): include if all three edges <= maxR (i.e., max edge length <= maxR)
84 for(int i=0;i<n;i++){
85 for(int j=i+1;j<n;j++){
86 if(D[i][j] > maxR) continue;
87 for(int k=j+1;k<n;k++){
88 double m = max({D[i][j], D[i][k], D[j][k]});
89 if(m <= maxR){
90 simplices.push_back({{i,j,k}, 2, m});
91 }
92 }
93 }
94 }
95
96 // Sort by (filtration, dimension, lex vertices) to enforce a consistent order
97 stable_sort(simplices.begin(), simplices.end(), [](const Simplex& a, const Simplex& b){
98 if (a.filt != b.filt) return a.filt < b.filt;
99 if (a.dim != b.dim) return a.dim < b.dim; // faces first on ties
100 return a.verts < b.verts;
101 });
102
103 int m = (int)simplices.size();
104
105 // Map simplex key -> index in sorted order
106 unordered_map<string,int> indexOf; indexOf.reserve(m*2);
107 for(int i=0;i<m;i++){
108 indexOf[keyOf(simplices[i].verts)] = i;
109 }
110
111 // Build boundary columns: store each column as sorted list of row indices with 1s (Z2)
112 vector<vector<int>> boundary(m);
113 for(int col=0; col<m; col++){
114 const auto& s = simplices[col];
115 if(s.dim == 0) continue; // vertex has empty boundary
116 // faces by removing one vertex at a time
117 for(int t=0; t<(int)s.verts.size(); t++){
118 vector<int> face = s.verts;
119 face.erase(face.begin()+t);
120 auto it = indexOf.find(keyOf(face));
121 if(it == indexOf.end()){
122 cerr << "Error: face not found in filtration (violates face condition)\n";
123 return 1;
124 }
125 boundary[col].push_back(it->second);
126 }
127 // Sort rows to enable XOR via symmetric difference
128 sort(boundary[col].begin(), boundary[col].end());
129 }
130
131 // Column reduction over Z2
132 vector<int> low(m, -1); // low[j] = pivot row of reduced column j, or -1 if zero column
133 unordered_map<int,int> pivotCol; // pivot row -> column having that low
134 pivotCol.reserve(m*2);
135
136 for(int j=0; j<m; j++){
137 auto& colVec = boundary[j];
138 // compute low(j)
139 auto getLow = [&](){ return colVec.empty() ? -1 : colVec.back(); };
140 int lj = getLow();
141 while(lj != -1){
142 auto it = pivotCol.find(lj);
143 if(it == pivotCol.end()) break;
144 int k = it->second; // existing column with same low
145 // XOR columns: col_j = col_j + col_k over Z2
146 colVec = symdiff(colVec, boundary[k]);
147 lj = getLow();
148 }
149 low[j] = lj;
150 if(lj != -1){
151 pivotCol[lj] = j; // j kills the class born at row lj
152 }
153 }
154
155 // Collect intervals:
156 // - If column j is zero after reduction (low[j]==-1), it creates an H_{dim(s_j)} class at filt(s_j) with infinite death (unless killed later by higher-dim columns, which appears via their pairing)
157 // - If low[j]==i, then column j (dim d) kills a class in H_{d-1} that was born at column i's filtration
158
159 struct Interval { int dim; double birth, death; bool infinite; };
160 vector<Interval> intervals;
161
162 // Map pivot row -> its birth simplex index is itself (column index equal to row index)
163 // Create finite intervals from nonzero columns
164 for(int j=0;j<m;j++){
165 if(low[j] != -1){
166 int i = low[j];
167 int d = simplices[j].dim - 1; // killed feature dimension
168 if(d >= 0){
169 intervals.push_back({d, simplices[i].filt, simplices[j].filt, false});
170 }
171 }
172 }
173 // Create infinite intervals from zero columns that were never used as a pivot row
174 // A zero column at dim d births an H_d class; if its index appears as some low[j], it is killed; otherwise it persists to infinity
175 vector<char> isKilled(m, 0);
176 for(int j=0;j<m;j++) if(low[j]!=-1) isKilled[low[j]] = 1;
177 for(int i=0;i<m;i++){
178 if(!isKilled[i]){
179 // Column i either zero or nonzero. Only zero columns are births; nonzero columns correspond to deaths of lower-dim classes
180 if(low[i] == -1){
181 int d = simplices[i].dim;
182 intervals.push_back({d, simplices[i].filt, numeric_limits<double>::infinity(), true});
183 }
184 }
185 }
186
187 // Print results grouped by dimension
188 cout.setf(std::ios::fixed); cout<<setprecision(4);
189 cout << "Intervals (dimension 0):\n";
190 for(const auto& I: intervals){
191 if(I.dim==0){
192 cout << " [" << I.birth << ", " << (I.infinite? string("inf"): to_string(I.death)) << ")\n";
193 }
194 }
195 cout << "Intervals (dimension 1):\n";
196 for(const auto& I: intervals){
197 if(I.dim==1){
198 cout << " [" << I.birth << ", " << (I.infinite? string("inf"): to_string(I.death)) << ")\n";
199 }
200 }
201
202 return 0;
203}
204

This program constructs a Vietoris–Rips filtration up to dimension 2 from a small 2D point set. Each 0-simplex has filtration 0; each edge’s filtration is its length; each triangle’s filtration is the maximum of its three edge lengths. After sorting simplices by (filtration, dimension, lexicographic vertices), it builds a sparse boundary matrix over Z2, performs the standard column-reduction algorithm, and extracts intervals. A nonzero reduced column of dimension d pairs with its pivot row to give an H_{d-1} interval [birth_of_pivot, filtration_of_column). A zero column at dimension d creates an H_d class which persists unless later killed (which would be reflected as someone else’s pivot). The output lists H0 and H1 intervals.

Time: O(n^2 log n + m^3) worst-case (n points; m simplices), with m dominated by Rips growthSpace: O(m^2) worst-case (dense), but this code stores sparse columns so closer to O(NZ) nonzeros
Fast H0 Persistence via Union–Find (Equivalent to Kruskal’s MST)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute zero-dimensional persistence (H0) from a point cloud by
5// sorting all edges by length and unioning components (Kruskal).
6// Each union corresponds to a death time for one H0 bar; one bar persists to infinity.
7
8struct DSU {
9 vector<int> p, r;
10 DSU(int n=0){init(n);}
11 void init(int n){ p.resize(n); r.assign(n,0); iota(p.begin(), p.end(), 0);}
12 int find(int x){ return p[x]==x?x:p[x]=find(p[x]); }
13 bool unite(int a, int b){ a=find(a); b=find(b); if(a==b) return false; if(r[a]<r[b]) swap(a,b); p[b]=a; if(r[a]==r[b]) r[a]++; return true; }
14};
15
16int main(){
17 vector<pair<double,double>> P = {
18 {0,0},{1,0},{2,0},{0,1},{1,1},{2,1}
19 }; // 6 points in a grid-like pattern
20 int n = (int)P.size();
21
22 // Build all edges with their lengths
23 struct Edge{int u,v; double w;};
24 vector<Edge> E;
25 for(int i=0;i<n;i++) for(int j=i+1;j<n;j++){
26 double dx=P[i].first-P[j].first, dy=P[i].second-P[j].second;
27 double d=hypot(dx,dy);
28 E.push_back({i,j,d});
29 }
30 sort(E.begin(), E.end(), [](const Edge& a, const Edge& b){ return a.w < b.w; });
31
32 // Initially each vertex births an H0 class at time 0
33 // Each successful union kills one class at the current edge weight
34 DSU dsu(n);
35 vector<pair<double,double>> intervals; // [birth, death)
36 int components = n;
37 for(const auto& e : E){
38 if(dsu.unite(e.u, e.v)){
39 intervals.push_back({0.0, e.w}); // one bar dies now (birth=0)
40 components--;
41 if(components==1) break; // remaining bar is infinite
42 }
43 }
44
45 cout.setf(std::ios::fixed); cout<<setprecision(4);
46 cout << "H0 intervals from Kruskal/Union-Find:\n";
47 for(const auto& I : intervals){
48 cout << " [" << I.first << ", " << I.second << ")\n";
49 }
50 cout << " [0.0000, inf)\n"; // final component persists
51 return 0;
52}
53

This code computes H0 persistence by sorting all pairwise edges and performing union–find merges—exactly Kruskal’s MST process. Each time two components merge under the smallest available edge, one H0 bar dies at that edge length, while all births occur at 0 for points. One component remains forever, yielding the infinite interval [0, ∞). This scales to large datasets and is the standard fast approach for connected-component persistence.

Time: O(E log E + E α(n)), where E = n(n−1)/2 for all pairwise edgesSpace: O(E + n)
#persistent homology#filtration#vietoris-rips#barcode#persistence diagram#boundary matrix reduction#betti numbers#tda#kruskal#union find#bottleneck distance#alpha complex#simplicial complex#homology#point cloud