⚙️AlgorithmAdvanced

DP with Expected Value

Key Points

  • Dynamic programming with expected value solves problems where each state transitions randomly and we seek the expected cost, time, or steps to reach a goal.
  • Use the key recurrence E[state] = sum over transitions of probability × (immediate cost + E[next]).
  • Back-propagation works on DAGs or acyclic processes; for cycles, set up and solve linear equations because simple DP order does not exist.
  • Linearity of expectation lets you sum expectations of parts, even if they are dependent, which simplifies many counting expectations.
  • Indicator variables are powerful: the expected count of events equals the sum of their individual probabilities.
  • Self-loops require rearranging the equation to E[s] = (1 + sum_{t≠s} )E[t]) / (1 − )).
  • Numerical stability matters: normalize probabilities, avoid division by numbers close to zero, and consider Gaussian elimination with pivoting for general systems.
  • Common applications include expected hitting time in Markov chains, coupon collector–style processes, randomized graph walks, and board games with dice.

Prerequisites

  • Basic probability and random variablesUnderstanding expectation, conditional probability, and distributions is essential to form the DP recurrences.
  • Dynamic programming fundamentalsYou must recognize states, transitions, and how to compute values from terminal conditions.
  • Directed graphs and DAGsBackward computation relies on topological orderings and recognizing acyclicity.
  • Systems of linear equations and Gaussian eliminationCyclic dependencies require solving linear systems to obtain expectations.
  • Floating-point arithmetic and numerical stabilityExpectations often involve small/large probabilities; stability and precision matter.

Detailed Explanation

Tap terms for definitions

01Overview

DP with expected value addresses problems where outcomes are random and we want the average (expected) performance, such as expected steps, time, or cost to reach a goal. Instead of maximizing or minimizing deterministically, we write, for each state, an equation for E[state], the expected total cost from that state. The backbone idea is linearity of expectation and the law of total expectation: the expectation from a state equals the weighted average (by probabilities) of immediate cost plus the expectation from successor states. On directed acyclic structures (like DAGs or monotone progress), we can compute E[state] by dynamic programming from sinks (goals) backward. When cycles exist (for example, self-loops or general Markov chains), the dependencies become circular; then we convert the DP equations into a linear system and solve it (often by Gaussian elimination). This approach generalizes across many competitive programming problems: expected number of moves to finish a board game, time to collect all items, number of times an event occurs, and more. Two additional tools make life easier: indicator variables (to count expected totals by summing simple probabilities) and careful handling of self-loops by moving terms to the left-hand side. The result is a versatile toolkit blending probability, graph thinking, and linear algebra.

02Intuition & Analogies

Picture walking in a maze where at each junction you toss a biased coin to decide which corridor to take. You want to know, on average, how many steps it takes to exit. If you stood at a junction and asked, “What’s my expected remaining steps?”, you would say: one step now plus the average of what happens after, weighted by the chance you take each corridor. If the maze has no cycles (you always move closer to the exit), you can start from the exit (0 steps) and work backward—this is classic DP. If the maze has traps that can send you in circles, your question at a junction depends on itself indirectly; your equation references E[this junction] again through loops. That’s fine: it just becomes an algebra problem where you solve a small system of equations to find the number. Another intuition comes from board games like rolling a die to advance. Each roll adds 1 turn. Sometimes you stay put (overshoot), sometimes you move forward, sometimes a chute sends you back. The expected total turns from a square is 1 (for the current roll) plus the average of futures. If certain rolls keep you on the same square, you can imagine paying 1 turn and retrying; the self-loop increases the expectation because sometimes you “waste” a roll. For counting expectations (how many coupons you collect after k rolls, or how many fixed points a random permutation has), indicator variables are like little light bulbs that turn on if an event occurs. The expected number of lit bulbs is just the sum of their probabilities of being on—no matter how the bulbs influence each other. This removes complex dependencies and keeps the math simple.

03Formal Definition

Let S be a finite set of states with an absorbing subset G ⊆ S (goal states). For each non-goal state s ∈ S \ G, we have a transition distribution P(s, t) over t ∈ S and an immediate cost c(s) (often c(s) = 1 for “one step” problems). Define the expected total cost to reach G by E[s], with boundary condition E[g] = 0 for g ∈ G. The law of total expectation yields the Bellman-like equation: E[s] = c(s) + P(s, t) E[t]. If there is a self-loop with probability P(s, s) > 0, we may rearrange to (1 − P(s, s)) E[s] = c(s) + P(s, t) E[t], provided 1 − P(s, s) > 0. In matrix form over non-goal states, let x be the vector of E[s], the identity minus the submatrix of transition probabilities among non-goal states, and b the vector of immediate costs (often all ones). Then with boundary conditions handled by removing goal states (or equivalently by including them with fixed zeros). For counting expectations over events {}, define indicator variables . By linearity of expectation, E[ ] = E[] = P(), regardless of dependence among events. On acyclic transition graphs (DAG), topological order ensures a simple backward DP evaluation because every E[s] depends only on E[t] with t “later” in the order.

04When to Use

Use DP with expected value when: (1) you need the expected number of steps, time, or cost in a stochastic process with states and transitions; (2) you can write a recurrence that expresses E[state] via its successors and an immediate cost; and (3) either the state graph is acyclic (allowing backward DP) or you can afford to solve a linear system for cyclic dependencies. Common competitive programming patterns include: expected turns to finish a board game with dice and special cells; randomized algorithms’ running time on structured inputs; expected path length in DAGs when each outgoing edge is chosen uniformly or by given probabilities; coupon collector and its variants, often solved by a 1D DP over “how many items collected” with a clean backward recurrence; expected number of occurrences of an event over many trials via indicators (sum of probabilities), which circumvents complex dependencies; and hitting times in Markov chains (absorbing states), where you either exploit structure (tri-diagonal for simple random walks) or use Gaussian elimination for general graphs. Avoid this tool when you need full distributions (variance or quantiles) or when the state space is too large to store expectations or solve linear systems; in such cases, consider Monte Carlo simulation or structural simplifications.

⚠️Common Mistakes

  1. Confusing expectation with probability: E[X] is an average value, not P(X occurs). For counts, use indicator variables to connect them correctly. 2) Misusing linearity: E[f(X) + g(Y)] = E[f(X)] + E[g(Y)], but E[XY] ≠ E[X]E[Y] in general; linearity does not apply to products or nonlinear functions. 3) Ignoring cycles: attempting backward DP on graphs with cycles leads to infinite loops or undefined order. For cycles, write linear equations and solve them, or identify a contraction structure that permits iterative methods. 4) Self-loop division by near-zero: when P(s, s) is close to 1, E[s] can be large and numerically unstable; keep enough precision, and prefer a linear-system approach with pivoting. 5) Not normalizing probabilities: ensure outgoing probabilities sum to 1 for non-absorbing states; otherwise the equations are wrong. 6) Off-by-one in immediate costs: decide whether the current step contributes cost 1 before or after transition; be consistent across states. 7) Floating-point precision: use double with care, format outputs appropriately, and consider long double when sums of many small probabilities are present. 8) Overlooking boundary conditions: set E[goal] = 0 (or the correct terminal cost) and ensure goals have no outgoing transitions (or that transitions from goals are modeled consistently).

Key Formulas

Linearity of Expectation

Explanation: The expected value of a sum equals the sum of expected values, even if X and Y are dependent. This is the key tool for decomposing expectations.

Law of Total Expectation

Explanation: Partition the randomness by a variable S, compute conditional expectations, and take their weighted average. This underlies the DP recurrence by conditioning on the next transition.

Expectation DP Recurrence

Explanation: The expected cost from state s equals the immediate cost plus the average of expected costs of successor states. Commonly c(s)=1 for expected step counts.

Handling Self-Loops

Explanation: Move the self-loop term to the left so you can solve for E[s]. This avoids infinite regress when a state can stay put.

Linear System for Hitting Time

Explanation: For non-terminal states, the expectations satisfy a linear system with coefficient matrix identity minus the transition submatrix. Solve x for expected hitting times.

Indicator Summation

Explanation: The expected count of events equals the sum of their probabilities. This works regardless of dependencies among events.

Coupon Collector Backward Recurrence

Explanation: With k of n types collected, the expected additional draws equals n/(n-k) plus the expectation from state k+1. Summing yields the harmonic number formula.

Coupon Collector Expectation

Explanation: The expected time to collect all n coupons equals n times the nth harmonic number. This follows from telescoping the backward recurrence.

Time Complexity (Elimination)

Explanation: Solving a dense k×k linear system by Gaussian elimination takes cubic time. Use this when cycles prevent simple DP.

DAG Backward DP

Explanation: On a DAG, compute expectations from sinks to sources since dependencies flow forward only. This runs in linear time in the number of edges.

Complexity Analysis

Let S be the number of states and T the number of transitions. If the transition graph is a DAG (or can be made acyclic by monotonic progress), backward DP computes expectations in O(S + T) time and O(S) space after a topological order. Each state is processed once and each edge contributes a constant-time weighted accumulation. This is ideal for problems like expected steps to reach a sink in a directed acyclic graph or monotone board movement without backward edges. If cycles exist, the DP equations couple states, so we form a linear system over the non-absorbing states. In the worst case with k non-absorbing states and dense transitions, Gaussian elimination runs in O() time and O() space due to the augmented matrix. Partial pivoting improves numerical stability with negligible asymptotic overhead. When structure exists—like tridiagonal systems in 1D random walks (gambler’s ruin)—the specialized Thomas algorithm solves in O(k) time and O(k) space. Numerical complexity also depends on precision: double-precision floating points are usually adequate. However, when probabilities are extreme (very close to 0 or 1) or expected times are large, rounding error can accumulate. Using long double (80-bit) or iterative refinement can improve accuracy. Preprocessing may be required to normalize transition probabilities per state to ensure each row sums to 1. Overall, choose: (1) O(S+T) backward DP for acyclic graphs; (2) specialized linear solvers for sparse/structured cyclic graphs; or (3) general Gaussian elimination for small-to-medium dense cyclic graphs.

Code Examples

Backward DP on a DAG: Expected steps to reach a sink with given transition probabilities
1#include <bits/stdc++.h>
2using namespace std;
3
4// Solve expected steps to reach any sink in a DAG.
5// Each node u has outgoing edges (u -> v) with probabilities that sum to 1 (if outdegree>0).
6// Sink nodes have outdegree 0 and E[u] = 0.
7// For non-sink u: E[u] = 1 + sum_{v} p(u->v) * E[v].
8
9int main() {
10 ios::sync_with_stdio(false);
11 cin.tie(nullptr);
12
13 int n, m; // n nodes (0..n-1), m directed edges
14 if (!(cin >> n >> m)) return 0;
15 vector<vector<pair<int, double>>> adj(n);
16 vector<int> indeg(n, 0), outdeg(n, 0);
17 vector<pair<int,int>> edges(m);
18
19 for (int i = 0; i < m; ++i) {
20 int u, v; double p;
21 cin >> u >> v >> p; // edge u->v with probability p
22 edges[i] = {u, v};
23 adj[u].push_back({v, p});
24 indeg[v]++;
25 outdeg[u]++;
26 }
27
28 // Verify DAG via Kahn's algorithm (optional, but good safety for this example)
29 vector<int> topo;
30 queue<int> q;
31 vector<int> indeg_copy(n);
32 for (int i = 0; i < n; ++i) indeg_copy[i] = indeg[i];
33 for (int i = 0; i < n; ++i) if (indeg_copy[i] == 0) q.push(i);
34 while (!q.empty()) {
35 int u = q.front(); q.pop();
36 topo.push_back(u);
37 for (auto [v, p] : adj[u]) {
38 (void)p; // unused here
39 if (--indeg_copy[v] == 0) q.push(v);
40 }
41 }
42 if ((int)topo.size() != n) {
43 cerr << "Error: Graph has cycles; use a linear solver instead.\n";
44 return 0;
45 }
46
47 // Normalize probabilities per node (in case input wasn't normalized)
48 for (int u = 0; u < n; ++u) {
49 if (outdeg[u] == 0) continue;
50 double sum = 0.0;
51 for (auto &e : adj[u]) sum += e.second;
52 if (sum <= 0) {
53 cerr << "Invalid probabilities at node " << u << "\n";
54 return 0;
55 }
56 for (auto &e : adj[u]) e.second /= sum;
57 }
58
59 // Compute expectations in reverse topological order
60 vector<double> E(n, 0.0);
61 for (int i = n - 1; i >= 0; --i) {
62 int u = topo[i];
63 if (outdeg[u] == 0) { E[u] = 0.0; continue; }
64 double val = 1.0; // immediate cost per step
65 for (auto [v, p] : adj[u]) val += p * E[v];
66 E[u] = val;
67 }
68
69 cout.setf(std::ios::fixed); cout << setprecision(10);
70 for (int i = 0; i < n; ++i) {
71 cout << E[i] << "\n";
72 }
73 return 0;
74}
75

We compute expected steps to reach any sink in a directed acyclic graph. For sinks, E=0. For other nodes, E[u] equals 1 (the current step) plus the probability-weighted expectations of successors. A reverse topological evaluation guarantees that when we process u, all E[v] for its successors v are already known. We also normalize probabilities per node for robustness.

Time: O(n + m)Space: O(n + m)
Coupon Collector via Backward DP: E[k] = n/(n-k) + E[k+1]
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute expected number of draws to collect all n coupons.
5// State k = how many distinct coupons already collected.
6// Recurrence: E[k] = n/(n-k) + E[k+1], with E[n] = 0.
7// This follows from: E[k] = 1 + (k/n) E[k] + ((n-k)/n) E[k+1].
8
9int main() {
10 ios::sync_with_stdio(false);
11 cin.tie(nullptr);
12
13 int n; if (!(cin >> n)) return 0;
14 vector<long double> E(n + 1, 0.0L);
15 E[n] = 0.0L;
16 for (int k = n - 1; k >= 0; --k) {
17 E[k] = (long double)n / (long double)(n - k) + E[k + 1];
18 }
19
20 cout.setf(std::ios::fixed);
21 cout << setprecision(10) << (double)E[0] << "\n";
22
23 // Optional cross-check using harmonic numbers: E[T] = n * H_n
24 long double H = 0.0L;
25 for (int i = 1; i <= n; ++i) H += 1.0L / (long double)i;
26 long double expected = (long double)n * H;
27 cerr.setf(std::ios::fixed);
28 cerr << setprecision(10) << "Check (n*H_n): " << (double)expected << "\n";
29 return 0;
30}
31

We model the coupon collector with a 1D DP over how many types have been collected. From state k, with probability k/n you repeat state k and with probability (n−k)/n you move to k+1. Rearranging gives a simple backward recurrence that telescopes to n·H_n. We use long double for better precision and output E[0], the expected number of draws from scratch.

Time: O(n)Space: O(n)
General Markov Chain Hitting Time via Gaussian Elimination
1#include <bits/stdc++.h>
2using namespace std;
3
4// Solve E[i] for expected steps to hit any terminal state in a general Markov chain.
5// States: 0..n-1. Given terminals set T. For non-terminals N, equations are:
6// E[i] = 1 + sum_j P[i][j] * E[j], i in N
7// with E[t] = 0 for t in T. Rearranged over N:
8// (I - P_NN) * x = 1, where x are E[i] for i in N.
9// We'll build and solve the linear system with partial pivoting.
10
11int main() {
12 ios::sync_with_stdio(false);
13 cin.tie(nullptr);
14
15 int n; if (!(cin >> n)) return 0;
16 vector<int> terminal(n, 0);
17 int k; cin >> k; // number of terminal states
18 for (int i = 0; i < k; ++i) { int t; cin >> t; terminal[t] = 1; }
19
20 vector<vector<long double>> P(n, vector<long double>(n, 0.0L));
21 for (int i = 0; i < n; ++i) {
22 for (int j = 0; j < n; ++j) {
23 long double p; cin >> p; P[i][j] = p; // row i should sum to 1 if i non-terminal; row of terminal can be anything but E[t]=0
24 }
25 }
26
27 // Map non-terminal indices to columns
28 vector<int> idx(n, -1), ridx; ridx.reserve(n);
29 for (int i = 0; i < n; ++i) if (!terminal[i]) { idx[i] = (int)ridx.size(); ridx.push_back(i); }
30 int m = (int)ridx.size();
31
32 if (m == 0) {
33 // All states are terminal; expectations are zero
34 for (int i = 0; i < n; ++i) cout << fixed << setprecision(10) << 0.0 << (i+1==n?'\n':' ');
35 return 0;
36 }
37
38 // Build augmented matrix A|b of size m x (m+1)
39 vector<vector<long double>> A(m, vector<long double>(m + 1, 0.0L));
40 for (int r = 0; r < m; ++r) {
41 int i = ridx[r];
42 // Left side: I - P_NN
43 for (int j = 0; j < n; ++j) {
44 if (terminal[j]) continue; // E[j]=unknown only if non-terminal
45 int c = idx[j];
46 long double coeff = (i == j ? 1.0L : 0.0L) - P[i][j];
47 A[r][c] += coeff;
48 }
49 // Right side: b = 1
50 A[r][m] = 1.0L;
51 }
52
53 // Gaussian elimination with partial pivoting
54 const long double EPS = 1e-15L;
55 for (int col = 0, row = 0; col < m && row < m; ++col) {
56 int sel = row;
57 for (int i = row; i < m; ++i) if (fabsl(A[i][col]) > fabsl(A[sel][col])) sel = i;
58 if (fabsl(A[sel][col]) < EPS) continue; // singular or nearly singular; skip
59 swap(A[sel], A[row]);
60 long double div = A[row][col];
61 for (int j = col; j <= m; ++j) A[row][j] /= div; // make pivot 1
62 for (int i = 0; i < m; ++i) if (i != row) {
63 long double factor = A[i][col];
64 if (fabsl(factor) < EPS) continue;
65 for (int j = col; j <= m; ++j) A[i][j] -= factor * A[row][j];
66 }
67 ++row;
68 }
69
70 // Extract solution x (if a column was skipped due to near-singularity, default 0)
71 vector<long double> x(m, 0.0L);
72 for (int i = 0; i < m; ++i) x[i] = A[i][m];
73
74 // Compose full E[0..n-1]
75 vector<long double> E(n, 0.0L);
76 for (int i = 0; i < n; ++i) if (!terminal[i]) E[i] = x[idx[i]]; // terminals remain 0
77
78 cout.setf(std::ios::fixed); cout << setprecision(10);
79 for (int i = 0; i < n; ++i) cout << (double)E[i] << (i+1==n?'\n':' ');
80 return 0;
81}
82

We accept an n×n transition matrix and a set of terminal states. For each non-terminal i, we write E[i] − sum_{j in N} P[i][j] E[j] = 1 and build the augmented matrix (I − P_NN | 1). Gaussian elimination with partial pivoting solves for the unknown expectations even in the presence of cycles and self-loops. Terminal expectations are fixed to zero.

Time: O(m^3) where m is the number of non-terminal statesSpace: O(m^2)