MathIntermediate

Catalan Numbers

Key Points

  • Catalan numbers count many 'non-crossing' and 'well-formed' structures like balanced parentheses, binary trees, Dyck paths, and triangulations of a convex polygon.
  • The nth Catalan number has the closed form = , which is fast to compute modulo a prime using factorials and modular inverses.
  • A fundamental recurrence is and = , reflecting a left/right split in many combinatorial objects.
  • The generating function is C(x) = , which implies xC(x)^2 - C(x) + 1 = 0 and leads to the recurrence.
  • Asymptotically, , so values grow very quickly.
  • To compute exactly in C++, use boost::multiprecision::cp and a multiplicative formula for central binomials.
  • To compute modulo a prime (e.g., 1e9+7), use = (n+1)^{-1} p with precomputed factorials and inverse factorials.
  • Be careful with off-by-one mappings: triangulations of an n-gon equal , and valid parentheses of length 2n equal .

Prerequisites

  • Binomial coefficientsCatalan numbers use central binomial coefficients in their closed form and require understanding nCk.
  • Modular arithmeticEfficient computation modulo a prime uses modular inverses and factorial precomputation.
  • Recurrences and dynamic programmingThe core Catalan recurrence is a convolution DP that builds C_n from smaller values.
  • Generating functions (basic)The Catalan generating function explains the recurrence and closed form.
  • Big integer arithmeticExact computation for moderate n requires arbitrary-precision integers.
  • Combinatorial bijectionsMapping problems (parentheses, trees, triangulations) to Catalan numbers often relies on bijective reasoning.

Detailed Explanation

Tap terms for definitions

01Overview

Catalan numbers form a famous sequence in combinatorics that counts a surprising variety of structured objects. The nth Catalan number, typically denoted C_n, appears whenever you build something recursively by splitting a problem into two independent subproblems whose sizes add up to n−1. Classic examples include the number of ways to place n pairs of parentheses so that they are balanced, the number of full binary trees with n+1 leaves (or n internal nodes), the number of ways to triangulate a convex polygon with n+2 vertices, the number of non-crossing matchings among 2n points on a circle, and the number of Dyck paths of semilength n (paths of up-steps and down-steps that never go below the x-axis). Formally, Catalan numbers satisfy both a simple recurrence and a closed form. The recurrence reflects the “split into left and right parts” idea, while the closed form involves central binomial coefficients. The sequence starts C_0 = 1, C_1 = 1, C_2 = 2, C_3 = 5, C_4 = 14, C_5 = 42, and grows rapidly thereafter. In algorithmic problem solving and competitive programming, Catalan numbers are a go-to tool for counting these non-crossing or well-formed objects, often modulo a large prime like 1,000,000,007. Efficient computation relies on factorial precomputation and modular inverses, or on big integers when exact values are required.

02Intuition & Analogies

Imagine building a properly nested set of brackets. To construct any valid sequence with n pairs, the first '(' must pair with some ')' later. Whatever is inside that pair is itself a valid sequence with i pairs, and whatever follows is a valid sequence with the remaining n−1−i pairs. This “inside vs. after” split covers all possibilities without overlap, which is why we add C_i × C_{n−1−i} for all i. The same idea appears in binary trees: choose a root; the left subtree uses i nodes, and the right uses n−1−i. Each split is independent, so the counts multiply, and summing over all splits gives the total. Another picture is the handshake problem: 2n people sit around a circular table and shake hands in pairs with no handshakes crossing. Fix the partner of person 1; that partition divides the circle into two smaller arcs that must each have non-crossing handshakes. Again, multiply and sum. A Dyck path offers a hiking analogy: you take up-steps and down-steps, never going below ground level, and end at ground level after 2n steps. When the first return to ground happens, what’s inside and after forms two smaller valid hikes whose lengths add to 2(n−1). Across all these stories, a hallmark is splitting a structure into two independent, order-matters parts that collectively use up n−1 units of “complexity” (pairs, nodes, chords). That’s the beating heart of the Catalan recurrence and the reason Catalan numbers appear so pervasively.

03Formal Definition

The Catalan numbers {}_{n 0} are defined by and, for n 1, by the recurrence = \, . Equivalently, their ordinary generating function C(x) = satisfies the quadratic equation x C(x)^2 - C(x) + 1 = 0, whose solution with C(0)=1 is C(x) = . The closed-form expression is = = - . This identity follows from the binomial expansion of (1-4x)^{-1/2} and Lagrange inversion or via a classic ballot/Dyck path argument. Asymptotically, grows like , obtained by singularity analysis of the generating function near x = or by Stirling’s approximation of the central binomial coefficient. Catalan numbers enumerate many classes in bijection: balanced parenthesis strings with n pairs, full binary trees with n internal nodes, non-crossing perfect matchings of 2n labeled points on a circle, triangulations of a convex (n+2)-gon, and Dyck paths of semilength n, among others.

04When to Use

Use Catalan numbers when your counting problem features a recursive decomposition into two independent parts whose sizes sum to a fixed total, often n−1. Typical triggers include: balanced parentheses or bracketings; constructing binary search trees from in-order sequences (number of BST shapes with n distinct keys is C_n); non-crossing pairings on a circle (handshakes, chords, matching parentheses as geometric arcs); Dyck or ballot-like lattice paths that never go below the diagonal; and triangulations of a convex polygon (triangulations of an n-gon equal C_{n-2}). In competitive programming, problems frequently ask for counts modulo a prime p, so the formula C_n = \binom{2n}{n} \cdot (n+1)^{-1} \bmod p with factorial precomputation is ideal for fast multiple queries. If you need exact integers (for small to moderate n), use big integers and the multiplicative form of central binomial coefficients. If the problem size is small or n is only up to a few thousands, the O(n^2) DP recurrence is straightforward and robust. Recognize Catalan cues: nesting, non-crossing, full binary structures, 'never-dip-below-zero' paths, or statements like 'choose a split point, then multiply left × right and sum over splits.'

⚠️Common Mistakes

• Off-by-one mapping: triangulations of a convex polygon with m vertices equal C_{m-2}, not C_m. Similarly, full binary trees with n internal nodes or with n+1 leaves correspond to C_n. Always double-check the parameter linking your problem to n. • Overflow from naive arithmetic: computing \binom{2n}{n} in 32-bit or even 64-bit integers overflows quickly. Use modular arithmetic with factorials and inverses for modulo problems, or big integers for exact values. • Dividing modulo a composite or forgetting inverses: you cannot simply do “/ (n+1)” under modulo arithmetic. Use a modular inverse (valid when the modulus is prime and n+1 is not a multiple of the modulus) or avoid division by using C_n = \binom{2n}{n} - \binom{2n}{n+1} (still needs inverses inside binomials). • Wrong base cases or indexing: the standard indexing has C_0 = 1. Some interpretations start from n=1; mixing these conventions can shift answers. Ensure your DP arrays and loop bounds match the chosen definition. • Misidentifying the pattern: not every 'two-part split' is Catalan. If the total doesn’t sum to n−1, or if the parts interact in more complex ways than independent multiplication, the recurrence changes. Watch for constraints that allow crossings, which invalidate a Catalan model. • Inefficient generation: generating all balanced parentheses is exponential. If only the count is required, don’t backtrack; compute C_n with DP or the closed form modulo a prime.

Key Formulas

Closed Form

Explanation: The nth Catalan number equals the central binomial coefficient divided by n+1. This is the fastest way to compute modulo a prime using factorials and modular inverses.

Ballot Identity

Explanation: An equivalent closed form that avoids explicit division. It follows from the identity = + .

Catalan Recurrence

Explanation: Each object with n units splits into a left part of size i and a right part of size n−1−i. Counts multiply because the parts are independent, and we sum over all splits.

Generating Function

Explanation: The ordinary generating function for Catalan numbers is algebraic. Expanding the square root gives the coefficients .

Quadratic Equation for C(x)

Explanation: Rearranged generating function identity encapsulating the recurrence. Coefficient extraction recovers and = .

Asymptotic Growth

Explanation: Catalan numbers grow roughly like 4^n scaled by a polynomial factor . This follows from Stirling’s approximation or singularity analysis.

Central Binomial Asymptotic

Explanation: A classic asymptotic that, combined with division by n+1, yields the Catalan asymptotic. Useful for estimating sizes.

Narayana Numbers

Explanation: Refines Catalan counts by the number of peaks in Dyck paths. Summing N(n,k) over ..n gives .

Triangulations Relation

Explanation: The number of triangulations of a convex m-gon equals the Catalan number indexed by m−2. Watch the index shift.

Catalan Convolution

Explanation: A shifted version of the recurrence. It appears when appending a root (or outermost pair) to a structure of size n.

Complexity Analysis

There are three common computational strategies, each with distinct costs. The dynamic programming recurrence = computes all values up to n in O() time and O(n) space, since each depends on all earlier values. This is robust and easy but too slow for very large n (e.g., ). The closed form = enables O(1) amortized queries after O(n) preprocessing modulo a prime p. Precompute factorials fact[i] and inverse factorials ifact[i] up to 2n in O(n) time and space; each binomial is then O(1), and multiplication by (n+1)^{-1} uses a modular inverse computed in O( p) if not precomputed. For many queries with varying n (bounded by N), precompute inv[i] or inverse factorials once to keep each query O(1). When exact integer values are required, use big integers. The multiplicative method for ..n: )/k computes with n exact divisions, then divides by (n+1). The arithmetic complexity depends on the size of the integers: the number of bits in is (n), so naive big-int multiplication leads to roughly O() bit operations overall; using advanced big-int multiplication can reduce this. Memory grows with the number of digits, (n), which is feasible for moderate n (e.g., n up to a few thousands). In generation problems (like listing all balanced parentheses), time is proportional to the output size, which is (), so exponential in n; use counting methods instead when only the count is needed.

Code Examples

Exact Catalan numbers with big integers (closed form, cpp_int)
1#include <bits/stdc++.h>
2#include <boost/multiprecision/cpp_int.hpp>
3using namespace std;
4using boost::multiprecision::cpp_int;
5
6// Compute central binomial coefficient C(2n, n) using a multiplicative loop
7// c_k = C(n+k, k) stays integer at each step, avoiding huge intermediate denominators
8static cpp_int central_binom(unsigned n) {
9 cpp_int c = 1; // c = C(n+0, 0)
10 for (unsigned k = 1; k <= n; ++k) {
11 c = c * (n + k) / k; // exact division at every step
12 }
13 return c; // equals C(2n, n)
14}
15
16cpp_int catalan_cpp_int(unsigned n) {
17 cpp_int c2n_n = central_binom(n);
18 return c2n_n / (n + 1); // exact division
19}
20
21int main() {
22 ios::sync_with_stdio(false);
23 cin.tie(nullptr);
24
25 unsigned n;
26 if (!(cin >> n)) return 0;
27
28 cpp_int Cn = catalan_cpp_int(n);
29 cout << Cn << "\n"; // exact value for moderate n (e.g., up to a few thousand)
30 return 0;
31}
32

This program computes the exact Catalan number C_n using big integers. It first computes the central binomial coefficient via a multiplicative identity that guarantees exact division at each step, then divides by (n+1). Using cpp_int avoids overflow and yields exact results.

Time: O(n M(b)) where M(b) is the cost of big-integer multiplication on ~b-bit numbers (b = Θ(n)); with naive multiplication, roughly O(n^2) bit operations.Space: O(b) big-integer digits, where b = Θ(n) is the number of bits of C_n.
Catalan modulo a prime using factorial precomputation
1#include <bits/stdc++.h>
2using namespace std;
3
4static const long long MOD = 1000000007LL; // prime
5
6long long modpow(long long a, long long e) {
7 long long r = 1 % MOD;
8 while (e > 0) {
9 if (e & 1) r = (__int128)r * a % MOD;
10 a = (__int128)a * a % MOD;
11 e >>= 1;
12 }
13 return r;
14}
15
16int main() {
17 ios::sync_with_stdio(false);
18 cin.tie(nullptr);
19
20 int q; // number of queries
21 if (!(cin >> q)) return 0;
22 vector<int> ns(q);
23 int maxn = 0;
24 for (int i = 0; i < q; ++i) { cin >> ns[i]; maxn = max(maxn, ns[i]); }
25
26 // Precompute factorials and inverse factorials up to 2*maxn
27 int LIM = 2 * maxn;
28 vector<long long> fact(LIM + 1), ifact(LIM + 1);
29 fact[0] = 1;
30 for (int i = 1; i <= LIM; ++i) fact[i] = (__int128)fact[i-1] * i % MOD;
31 ifact[LIM] = modpow(fact[LIM], MOD - 2); // Fermat's little theorem
32 for (int i = LIM; i > 0; --i) ifact[i-1] = (__int128)ifact[i] * i % MOD;
33
34 auto nCk = [&](int n, int k) -> long long {
35 if (k < 0 || k > n) return 0;
36 return (long long)(((__int128)fact[n] * ifact[k] % MOD) * ifact[n-k] % MOD);
37 };
38
39 for (int n : ns) {
40 long long c2n_n = nCk(2*n, n);
41 long long inv_n1 = modpow(n + 1, MOD - 2);
42 long long catalan = (long long)((__int128)c2n_n * inv_n1 % MOD);
43 cout << catalan << "\n";
44 }
45 return 0;
46}
47

For multiple queries, we precompute factorials and inverse factorials modulo a prime p to allow O(1) binomial queries. Then we use C_n = C(2n, n) · (n+1)^{-1} mod p, where the inverse is computed via Fermat's little theorem.

Time: O(maxn) preprocessing and O(1) per query (plus O(\log MOD) if not precomputing inverses for n+1).Space: O(maxn) for factorial and inverse factorial arrays.
Generate all balanced parentheses and verify Catalan count (small n)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Simple DP to compute Catalan numbers up to n (64-bit is safe up to n≈19)
5static vector<unsigned long long> catalan_dp(int n) {
6 vector<unsigned long long> C(n+1, 0);
7 C[0] = 1;
8 for (int k = 1; k <= n; ++k) {
9 unsigned long long sum = 0;
10 for (int i = 0; i < k; ++i) sum += C[i] * C[k-1-i];
11 C[k] = sum;
12 }
13 return C;
14}
15
16void gen(int n, int open, int close, string& cur, vector<string>& out) {
17 // open = number of '(' used; close = number of ')' used
18 if ((int)cur.size() == 2*n) { out.push_back(cur); return; }
19 if (open < n) {
20 cur.push_back('(');
21 gen(n, open+1, close, cur, out);
22 cur.pop_back();
23 }
24 if (close < open) {
25 cur.push_back(')');
26 gen(n, open, close+1, cur, out);
27 cur.pop_back();
28 }
29}
30
31int main() {
32 ios::sync_with_stdio(false);
33 cin.tie(nullptr);
34
35 int n; if (!(cin >> n)) return 0;
36 if (n > 15) {
37 cerr << "n too large for generation; try n <= 15\n";
38 return 0;
39 }
40
41 vector<string> all;
42 string cur;
43 gen(n, 0, 0, cur, all);
44
45 auto C = catalan_dp(n);
46
47 cout << "Generated " << all.size() << " sequences\n";
48 cout << "Catalan(" << n << ") via DP = " << C[n] << "\n";
49
50 // Optionally print them
51 for (const auto& s : all) cout << s << '\n';
52
53 return 0;
54}
55

This program backtracks to generate all balanced parentheses strings of length 2n and uses a DP recurrence to compute Catalan(n). For small n, it demonstrates that the number of generated strings matches C_n.

Time: O(C_n) to generate and print all sequences (output-sensitive and exponential in n). The DP verification costs O(n^2).Space: O(n) recursion depth plus O(C_n · n) to store all sequences.
Non-crossing handshakes among 2n people (Catalan recurrence, modulo 1e9+7)
1#include <bits/stdc++.h>
2using namespace std;
3
4static const int MOD = 1000000007;
5
6int main() {
7 ios::sync_with_stdio(false);
8 cin.tie(nullptr);
9
10 int n; // number of pairs (2n people)
11 if (!(cin >> n)) return 0;
12
13 // H[k] = number of non-crossing perfect matchings among 2k points on a circle
14 // Recurrence: H[0] = 1; H[k] = sum_{i=0}^{k-1} H[i] * H[k-1-i]
15 vector<int> H(n+1, 0);
16 H[0] = 1;
17 for (int k = 1; k <= n; ++k) {
18 long long s = 0;
19 for (int i = 0; i < k; ++i) s = (s + 1LL * H[i] * H[k-1-i]) % MOD;
20 H[k] = (int)s;
21 }
22
23 cout << H[n] << '\n'; // equals C_n mod MOD
24 return 0;
25}
26

We model 2n people on a circle and pair them with non-crossing handshakes. Fixing the partner of person 1 splits the remaining people into two arcs; each arc must be matched independently, yielding the Catalan recurrence. We compute the result modulo 1e9+7.

Time: O(n^2) using the direct recurrence.Space: O(n) for the DP array.