🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Random Walks on Graphs

Key Points

  • •
    A random walk on a graph moves from a node to one of its neighbors chosen uniformly at random at each step.
  • •
    The transition matrix P=D−1A encodes these move probabilities, where A is the adjacency matrix and D is the diagonal degree matrix.
  • •
    For connected, non-bipartite, undirected graphs, the walk converges to a stationary distribution proportional to node degrees.
  • •
    Many properties of random walks (mixing speed, hitting times) are governed by eigenvalues of P or the normalized Laplacian.
  • •
    Power iteration and Monte Carlo simulation let us compute distributions and hitting times efficiently on large graphs.
  • •
    Adding laziness (P' = (I + P)/2) or teleportation avoids periodicity and guarantees convergence in tricky graphs.
  • •
    In C++, it's best to avoid forming dense matrices; use adjacency lists and distribute probability mass along edges.
  • •
    Common pitfalls include mishandling dangling nodes (degree 0), ignoring periodicity in bipartite graphs, and numerical drift in probabilities.

Prerequisites

  • →Graph representations (adjacency list/matrix) — You must know how to store and traverse neighbors efficiently to implement P without building dense matrices.
  • →Basic probability and Markov chains — Random walks are Markov chains; understanding distributions, transitions, and stationarity is essential.
  • →Linear algebra fundamentals — Defining P = D^{-1}A, eigenvalues/eigenvectors, and similarity transforms underpin convergence analysis.
  • →Spectral graph theory (intro) — Eigenvalues of the normalized Laplacian or P control mixing, hitting times, and clustering behavior.
  • →C++ vectors and numeric stability — Accurate probability computations require careful accumulation, normalization, and random number use.
  • →Random number generation — Simulations rely on high-quality RNGs to model random steps and teleportation.

Detailed Explanation

Tap terms for definitions

01Overview

Imagine a tourist who is lost in a city. At each intersection (node), they choose one of the connecting streets (edges) uniformly at random and walk to the next intersection. This process is a random walk on a graph. The heart of the model is the transition matrix P = D^{-1}A: it turns the adjacency structure of the graph (A) into movement probabilities by normalizing each row by the node's degree (D). The (i, j)-th entry P_{ij} is the probability of moving from i to j in one step. The concept becomes powerful because it connects discrete graph structure with probability and linear algebra. Over many steps, the distribution of the tourist’s location evolves by multiplying by P repeatedly (p^{(t+1)} = p^{(t)} P). Under mild conditions (connected and aperiodic for undirected graphs), this distribution converges to a stationary distribution that does not change when multiplied by P. This reveals central nodes (they are visited more often), how fast randomness spreads (mixing time), and how long it takes to reach a target (hitting time). Random walks underpin algorithms in ranking (PageRank), clustering (spectral methods), sampling (Markov Chain Monte Carlo), and network analysis (resistance, diffusion). They scale effectively because transitions depend only on local neighborhoods, making them ideal for sparse graphs commonly found in real data sets.

02Intuition & Analogies

Think of P = D^{-1}A as a “fair splitter” for each node’s outgoing probability. If node i has degree d_i, you divide its probability mass equally into d_i buckets and pour one bucket onto each neighbor. That is exactly what a row of P does: normalize the adjacency row by d_i so that probabilities sum to 1. A good everyday analogy is shuffling a deck versus walking hallways. In shuffling, you mix cards globally in one go; in random walks, you mix locally, passing probability through doors from one room to the next. The speed of mixing depends on how many short alternative routes there are (expansion) and whether you can get stuck bouncing back and forth (periodicity in bipartite graphs). Adding “laziness” is like sometimes pausing in place; it breaks rigid back-and-forth cycles so the process settles. Stationary distribution is like the long-run crowd count in a mall: busy intersections (high degree) collect more people on average because there are more ways to arrive. For undirected graphs, the long-run share at node i is proportional to how many doors lead into it (its degree). If degrees are equal (regular graph), everyone is equally likely in the long run. Finally, hitting and commute times reflect navigability: how many steps until the walker reaches a store (target)? Dense, well-connected areas shorten these times; narrow bridges and bottlenecks lengthen them. Spectral quantities (eigenvalues) distill these connectivity features into numbers that predict mixing and search efficiency.

03Formal Definition

Let G = (V, E) be a finite graph with ∣V∣ = n. The adjacency matrix A ∈ Rn×n is defined by Aij​=1 if (i, j) ∈ E (or the weight wij​ for weighted graphs), and 0 otherwise. The degree matrix D is diagonal with Dii​=di​ = ∑j​ Aij​. The random-walk transition matrix is P=D−1 A, a row-stochastic matrix (each row sums to 1) assuming di​>0. For nodes with di​=0 (dangling), one can define Pii​=1 (self-loop) or introduce teleportation. A (row) probability distribution p(t) over V evolves as p^{(t+1)} = p^{(t)} P, so the k-step transition probabilities are Pk. A stationary distribution π satisfies π = π P (equivalently, P⊤ π = π). For connected, non-bipartite undirected graphs, π exists, is unique, and πi​ = di​ / (∑j​ dj​). Moreover, the chain is reversible with respect to π (detailed balance): πi​ Pij​ = πj​ Pji​. Define the random-walk Laplacian Lrw​=I−P and the symmetric normalized matrix S=D−1/2 A D−1/2=D1/2 P D−1/2, which is similar to P. The spectrum of S (eigenvalues 1 = λ1​ ≥ λ2​ ≥ ⋯ ≥ λn​ ≥ -1) governs convergence: the spectral gap 1 - ∣λ2​∣ controls the mixing rate. A lazy walk uses P' = (I + P)/2 to ensure aperiodicity in bipartite graphs.

04When to Use

  • Ranking and centrality: Use random walks to score nodes by long-run visit frequency (stationary distribution). This underlies PageRank (with teleportation) for directed graphs and degree centrality for undirected graphs.
  • Community detection and clustering: Short random walks tend to stay within communities before leaking out, which powers methods like spectral clustering and diffusion distances.
  • Sampling and estimation: When direct uniform sampling of nodes or substructures is hard, random walks provide approximate samples, especially in massive, implicitly defined graphs.
  • Diffusion modeling: Spread of information, heat, or epidemics can be approximated as random walks, where P models local spread.
  • Hitting/commute times: Estimate how quickly a search reaches a target, or compare pairwise accessibility via commute times; useful in navigation, recommendation, and network reliability.
  • Preprocessing for machine learning on graphs: Random-walk features (node2vec/DeepWalk) and k-step distributions provide informative node embeddings. Choose pure P = D^{-1}A when the graph is undirected, connected, and you can tolerate degree bias. Use lazy walks to avoid periodicity on bipartite graphs. Use teleportation (Google matrix) for directed graphs or when dangling nodes and dead ends are common.

⚠️Common Mistakes

  • Ignoring dangling nodes (degree 0): Without a fix, rows in P are invalid. Add self-loops (P_{ii} = 1) for pure walks or use teleportation to distribute probability globally.
  • Overlooking periodicity: Non-lazy walks on bipartite graphs oscillate and do not converge to a stationary distribution from arbitrary starts. Use P' = (I + P)/2 to ensure aperiodicity.
  • Building dense matrices: Forming dense P costs O(n^2) memory/time and is unnecessary on sparse graphs. Operate with adjacency lists and distribute mass along neighbors.
  • Confusing left vs. right eigenvectors: With row-stochastic P, the stationary distribution satisfies \pi = \pi P (a left eigenvector). Iteratively update row distributions p^{(t+1)} = p^{(t)} P, or equivalently multiply P^{\top} by a column vector.
  • Numerical drift: Floating-point sums may slowly lose the property that probabilities sum to 1. Renormalize vectors each iteration and use double precision.
  • Misinterpreting degree bias: In undirected graphs, high-degree nodes have higher stationary mass by design. If this is undesirable, consider normalized objectives or teleportation.
  • Too few iterations: Power iteration may require many steps when the spectral gap is small. Monitor convergence with ||p^{(t+1)} - p^{(t)}|| and cap iterations wisely.

Key Formulas

Transition Matrix

P=D−1A

Explanation: Each row of A is normalized by the node’s degree to create a row-stochastic matrix. Entry Pij​ is the probability of moving from i to j in one step.

Distribution Evolution

p(t+1)=p(t)P,p(t)=p(0)Pt

Explanation: The distribution after t steps is obtained by multiplying the initial distribution by P t times. This describes how probability mass flows along edges.

Stationary Distribution

π=πP⇔P⊤π=π

Explanation: The stationary distribution is unchanged by the transition. It is a left eigenvector with eigenvalue 1 (or a right eigenvector of PT).

Undirected Stationary Law

πi​=∑j=1n​dj​di​​=2mdi​​

Explanation: For a connected, non-bipartite undirected graph, the stationary probability of node i is proportional to its degree. Here m is the number of edges.

Detailed Balance

πi​Pij​=πj​Pji​

Explanation: This equality makes the chain reversible, enabling spectral analysis via symmetric matrices and simplifying many proofs.

Random-Walk and Normalized Laplacian

Lrw​=I−P,L=I−D−1/2AD−1/2

Explanation: The random-walk Laplacian relates directly to P. The normalized Laplacian L is symmetric and similar to P, making eigen-analysis convenient.

Symmetric Similarity

S=D−1/2AD−1/2=D1/2PD−1/2

Explanation: S is symmetric and similar to P, so it shares eigenvalues with P. This connection allows using tools for symmetric matrices to study P.

Lazy Random Walk

P′=21​(I+P)

Explanation: Staying put with probability 21​ removes periodicity (e.g., bipartite oscillations) while preserving the stationary distribution.

Mixing Bound (Undirected)

​p(t)−π​TV​≤21​dmin​dmax​​​⋅∣λ2​∣t

Explanation: For an undirected walk, the convergence speed to the stationary distribution is controlled by the second-largest eigenvalue |λ2∣. A larger spectral gap implies faster mixing.

Expected Return Time

E[Ti+​]=πi​1​

Explanation: The expected number of steps to return to node i (after leaving) equals the reciprocal of its stationary probability.

Hitting Time Equations

ht​=0,hi​=1+j=1∑n​Pij​hj​(i=t)

Explanation: Hitting times satisfy a linear system with boundary condition at the target. Solutions give expected steps to reach t from each node.

Commute Time–Resistance

Cij​=2mRij​

Explanation: In undirected graphs, the expected time to go from i to j and back equals 2m times the effective electrical resistance between i and j.

Google Matrix (Teleportation)

G=αP+(1−α)1v⊤

Explanation: With probability α follow P, and with 1−α jump according to v. This fixes dangling nodes and ensures a unique stationary distribution even in directed graphs.

Complexity Analysis

Let n be the number of nodes and m the number of edges (treat m as the number of directed edges for directed graphs). Constructing P explicitly as a dense n×n matrix is O(n2) time and space, which is impractical for sparse graphs. Instead, store the graph in adjacency lists, which take O(n + m) space. Applying one step of the random-walk operator to a probability vector p (computing pP) can be done by distributing mass along neighbors in O(m) time: for each edge (i, j), add p[i]/deg(i) to the j-th entry. Power iteration to approximate the stationary distribution performs T iterations of p←pP, costing O(Tm) time and O(n) extra space for vectors. Convergence rate depends on the spectral gap 1 − |λ2∣; a small gap implies more iterations (sometimes thousands for hard graphs). Adding laziness or teleportation typically increases the gap and speeds convergence numerically. Monte Carlo simulation of S steps of a single random walk takes O(S) time and O(1) extra space beyond the graph. Estimating hitting times by averaging over R independent runs of up to S steps costs O(RS) time. The variance decreases like 1/R, so doubling accuracy may require quadrupling runs. If one must compute k-step transition probabilities for many k or many sources simultaneously, repeated sparse matrix–vector multiplies (SpMV) remain O(m) per step per vector. Building P explicitly as a sparse matrix CSCCSR​ allows batched operations but still uses O(n + m) memory. Avoid dense matrix powers Pk, which cost O(n3 log k) time and O(n2) space, unless n is very small. Overall, practical workflows leverage sparse representations, SpMV, and simulation to keep both time and memory near linear in graph size.

Code Examples

Simulate a random walk and estimate empirical visit frequencies
1#include <bits/stdc++.h>
2using namespace std;
3
4// Simulates a random walk on an unweighted graph using adjacency lists.
5// Also computes empirical visit frequencies over the trajectory.
6
7struct Graph {
8 int n; // number of nodes (0..n-1)
9 vector<vector<int>> adj; // adjacency list
10 vector<int> deg; // degrees
11 explicit Graph(int n=0): n(n), adj(n), deg(n,0) {}
12 void add_edge(int u, int v, bool undirected=true) {
13 adj[u].push_back(v); deg[u]++;
14 if (undirected) { adj[v].push_back(u); deg[v]++; }
15 }
16};
17
18int main() {
19 ios::sync_with_stdio(false);
20 cin.tie(nullptr);
21
22 // Build a sample undirected graph (a square with a diagonal)
23 // 0-1-2-3-0 and 1-3 diagonal
24 int n = 4;
25 Graph G(n);
26 G.add_edge(0,1); G.add_edge(1,2); G.add_edge(2,3); G.add_edge(3,0); G.add_edge(1,3);
27
28 int start = 0;
29 int steps = 100000; // large enough to approximate stationary distribution
30
31 // Random engine
32 mt19937_64 rng(123456);
33
34 vector<long long> visits(n, 0);
35 int cur = start;
36 visits[cur]++;
37
38 for (int t = 0; t < steps; ++t) {
39 if (G.deg[cur] == 0) {
40 // Dangling node: stay in place (self-loop)
41 // Alternative: teleport uniformly or to a preference distribution.
42 } else {
43 uniform_int_distribution<int> dist(0, G.deg[cur]-1);
44 int idx = dist(rng);
45 cur = G.adj[cur][idx];
46 }
47 visits[cur]++;
48 }
49
50 // Compute empirical frequencies
51 double total = accumulate(visits.begin(), visits.end(), 0.0);
52 cout.setf(ios::fixed); cout << setprecision(6);
53 cout << "Empirical visit frequencies (approx. stationary for undirected, non-bipartite):\n";
54 for (int i = 0; i < n; ++i) {
55 cout << "node " << i << ": " << (visits[i] / total) << "\n";
56 }
57
58 // For undirected graphs, theoretical stationary pi_i = deg(i)/(2m)
59 long long m = 0; for (int i = 0; i < n; ++i) m += G.deg[i]; m /= 2; // undirected edges count
60 cout << "\nTheory (pi_i = deg(i)/(2m)):\n";
61 for (int i = 0; i < n; ++i) {
62 cout << "node " << i << ": " << (double)G.deg[i] / (2.0 * m) << "\n";
63 }
64
65 return 0;
66}
67

We build a small undirected graph and run a long random walk, counting how often each node is visited. The empirical frequencies approach the stationary distribution. For undirected, connected, non-bipartite graphs, the stationary probability equals degree divided by 2m. We include a simple dangling-node rule (self-loop) to keep P valid even if deg=0.

Time: O(steps) for the simulation, O(n + m) to set up the graphSpace: O(n + m) for the graph, O(n) for visit counters
Power iteration for stationary distribution with optional teleportation (PageRank-style)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Computes stationary distribution by power iteration without forming dense P.
5// Supports pure random walk (alpha=0) and teleportation (alpha in (0,1)).
6// For alpha=0 and undirected connected non-bipartite graphs, the result converges to pi_i ∝ deg(i).
7// For directed graphs or with dangling nodes, use alpha in (0,1) (e.g., 0.15) to ensure uniqueness.
8
9struct Graph {
10 int n;
11 vector<vector<int>> out; // outgoing neighbors
12 vector<int> deg; // out-degrees
13 explicit Graph(int n=0): n(n), out(n), deg(n,0) {}
14 void add_edge(int u, int v) {
15 out[u].push_back(v); deg[u]++;
16 }
17};
18
19vector<double> power_iteration_stationary(const Graph& G,
20 double alpha, // teleport prob (0 => pure walk)
21 const vector<double>& teleport, // teleportation distribution (sum=1), size n
22 int max_iters = 10000,
23 double tol = 1e-12,
24 bool lazy = false // optionally use lazy walk
25) {
26 int n = G.n;
27 vector<double> p(n, 1.0 / n), nextp(n, 0.0);
28
29 auto l1_norm = [&](const vector<double>& v){ double s=0; for(double x:v) s += fabs(x); return s; };
30
31 // Precompute probability of staying put for laziness
32 double stay_prob = lazy ? 0.5 : 0.0; // P' = 0.5 I + 0.5 P if lazy
33 double move_scale = lazy ? 0.5 : 1.0;
34
35 for (int it = 0; it < max_iters; ++it) {
36 fill(nextp.begin(), nextp.end(), 0.0);
37
38 // Teleportation contribution: with probability alpha, jump to 'teleport'
39 if (alpha > 0) {
40 for (int j = 0; j < n; ++j) nextp[j] += alpha * teleport[j];
41 }
42
43 // Distribute mass along edges according to P (and laziness if enabled)
44 for (int i = 0; i < n; ++i) {
45 if (stay_prob > 0) nextp[i] += stay_prob * p[i]; // lazy self-loop component
46 if (G.deg[i] == 0) {
47 // Dangling node: in pure walk, keep self-loop; with teleportation, redistribute via teleport
48 if (alpha > 0) {
49 for (int j = 0; j < n; ++j) nextp[j] += move_scale * (1.0) * p[i] * (teleport[j]);
50 } else {
51 nextp[i] += move_scale * p[i];
52 }
53 } else {
54 double share = move_scale * ((alpha > 0) ? (1.0 - 0.0 - 0.0) : 1.0); // move_scale already applied
55 // With teleportation alpha, the non-teleport part is scaled by (1-alpha)
56 if (alpha > 0) share = move_scale * (1.0 - alpha);
57 double mass_per_edge = share * p[i] / (double)G.deg[i];
58 for (int j : G.out[i]) nextp[j] += mass_per_edge;
59 }
60 }
61
62 // Renormalize to guard against numerical drift
63 double s = 0; for (double x : nextp) s += x; for (double& x : nextp) x /= s;
64
65 // Check convergence in L1
66 vector<double> diff(n); for (int i = 0; i < n; ++i) diff[i] = nextp[i] - p[i];
67 if (l1_norm(diff) < tol) {
68 return nextp;
69 }
70 p.swap(nextp);
71 }
72 return p; // return last iterate if not converged
73}
74
75int main(){
76 ios::sync_with_stdio(false);
77 cin.tie(nullptr);
78
79 // Example: a small directed graph with a dangling node
80 // 0 -> 1, 0 -> 2, 1 -> 2, 2 -> (none, dangling), 3 -> 0
81 int n = 4;
82 Graph G(n);
83 G.add_edge(0,1); G.add_edge(0,2); G.add_edge(1,2); G.add_edge(3,0);
84
85 // Teleportation distribution: uniform
86 vector<double> teleport(n, 1.0 / n);
87
88 // Compute PageRank-like stationary distribution
89 double alpha = 0.15; // teleport probability
90 bool lazy = false; // could set true to avoid periodicity further
91 auto pr = power_iteration_stationary(G, alpha, teleport, 10000, 1e-12, lazy);
92
93 cout.setf(ios::fixed); cout << setprecision(8);
94 cout << "Stationary distribution (with teleportation alpha=0.15):\n";
95 for (int i = 0; i < n; ++i) cout << "node " << i << ": " << pr[i] << "\n";
96
97 // For an undirected connected non-bipartite graph and alpha=0, this becomes degree-proportional.
98 return 0;
99}
100

This implements power iteration without forming P explicitly. At each step, it adds teleportation mass (if alpha > 0) and distributes non-teleport mass along outgoing edges. Dangling nodes either keep mass (pure walk) or redistribute via teleportation. Optional laziness helps with periodicity. The algorithm returns a probability vector that approximates the stationary distribution.

Time: O(m × iterations), where m is the number of (directed) edgesSpace: O(n + m) for the graph, O(n) for probability vectors
Estimate hitting time by Monte Carlo with optional lazy random walk
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Graph {
5 int n; vector<vector<int>> adj; vector<int> deg;
6 explicit Graph(int n=0): n(n), adj(n), deg(n,0) {}
7 void add_edge(int u, int v, bool undirected=true) {
8 adj[u].push_back(v); deg[u]++;
9 if (undirected) { adj[v].push_back(u); deg[v]++; }
10 }
11};
12
13// Estimate expected hitting time from s to t by averaging over 'trials' walks.
14// Uses optional laziness to avoid periodicity in bipartite graphs.
15double estimate_hitting_time(const Graph& G, int s, int t, int trials, int max_steps,
16 bool lazy, uint64_t seed=12345) {
17 mt19937_64 rng(seed);
18 uniform_real_distribution<double> U(0.0, 1.0);
19
20 long long total_steps = 0;
21 for (int r = 0; r < trials; ++r) {
22 int cur = s;
23 int steps = 0;
24 while (cur != t && steps < max_steps) {
25 if (lazy && U(rng) < 0.5) {
26 // stay in place
27 } else {
28 if (G.deg[cur] == 0) {
29 // dangling: self-loop
30 } else {
31 uniform_int_distribution<int> dist(0, G.deg[cur]-1);
32 int idx = dist(rng);
33 cur = G.adj[cur][idx];
34 }
35 }
36 ++steps;
37 }
38 total_steps += steps;
39 }
40 return (double) total_steps / (double) trials;
41}
42
43int main(){
44 ios::sync_with_stdio(false);
45 cin.tie(nullptr);
46
47 // A simple graph with a bottleneck: two triangles connected by a bridge edge (2-3)
48 // Triangle1: 0-1-2-0, Triangle2: 3-4-5-3, Bridge: 2-3
49 Graph G(6);
50 G.add_edge(0,1); G.add_edge(1,2); G.add_edge(2,0);
51 G.add_edge(3,4); G.add_edge(4,5); G.add_edge(5,3);
52 G.add_edge(2,3);
53
54 int s = 0, t = 5;
55 double ht_lazy = estimate_hitting_time(G, s, t, /*trials=*/5000, /*max_steps=*/10000, /*lazy=*/true);
56 double ht_plain = estimate_hitting_time(G, s, t, /*trials=*/5000, /*max_steps=*/10000, /*lazy=*/false);
57
58 cout.setf(ios::fixed); cout << setprecision(3);
59 cout << "Estimated hitting time s=0 -> t=5 (lazy walk): " << ht_lazy << " steps\n";
60 cout << "Estimated hitting time s=0 -> t=5 (plain walk): " << ht_plain << " steps\n";
61
62 return 0;
63}
64

We estimate the expected number of steps to hit a target node by averaging over many simulated walks. Laziness can reduce oscillations in bipartite or near-bipartite structures and often stabilizes estimates. A max_steps cap prevents infinite loops in disconnected graphs; if s cannot reach t, the estimate will saturate at max_steps.

Time: O(trials × max_steps)Space: O(n + m) for the graph; O(1) extra per trial
#random walk#transition matrix#stationary distribution#markov chain#lazy random walk#spectral gap#normalized laplacian#pagerank#hitting time#commute time#mixing time#detailed balance#graph probability#power iteration