⚙️AlgorithmAdvanced

Matrix Exponentiation - Advanced

Key Points

•
Matrix exponentiation turns repeated linear transitions into fast O( $n^{3}$ log k) computation using exponentiation by squaring.
•
For graphs, ( $A^{L}$ )_{u,v} equals the number of directed walks of exactly L edges from u to v.
•
To count walks of at most K edges, exponentiate the 2N×2N block matrix [[A, I], [0, I]] and read the top-right block as $\sum_{i = 0}^{K}$ $A^{i}$ .
•
Linear recurrences can be computed by powering a constant transition matrix built from the recurrence’s coefficients.
•
Non-homogeneous recurrences with polynomial-in-n additive terms are handled by augmenting the state with [1, n, $n^{2}$ , ...] and using binomial transitions.
•
For very large n (up to 10^{18}), matrix powering remains feasible as long as the state dimension is small ( $t y p i c a ll y \leq 100$ ).
•
Be careful with identity matrices, multiplication order, and modular overflow; these are common sources of bugs.
•
Exploit sparsity or block structure when possible to cut constants and speed up multiplication.

Prerequisites

→Linear algebra basics (vectors, matrices, matrix multiplication) — Matrix exponentiation relies on composing linear transformations via matrix multiplication.
→Modular arithmetic — Most problems require counting modulo a prime or integer to avoid overflow.
→Exponentiation by squaring (scalar) — The matrix version generalizes the same binary powering idea.
→Graph representations (adjacency matrix vs adjacency list) — Understanding how adjacency matrices encode walks is crucial for path counting.
→Linear recurrences — Companion matrices derive from how linear recurrences advance state.
→Binomial theorem — Updating [1, n, n^{2}, ...] when increasing n uses binomial coefficients.

Detailed Explanation

Tap terms for definitions

01Overview

Matrix exponentiation is a technique that converts repeated linear updates into one matrix power. Imagine you have a process where the next state is a fixed linear combination of the current state’s components—like stepping along edges in a graph, or advancing a linear recurrence such as Fibonacci. If we package the state as a vector v and the update rule as a square matrix T, then after k steps the state becomes T^{k} v. Directly simulating k steps is O(k), which is too slow for very large k (e.g., 10^{18}). Matrix exponentiation uses exponentiation by squaring to compute T^{k} in O(n^{3} log k) time for n×n matrices using naive multiplication, shrinking a huge linear-time loop down to logarithmic in k. This technique is especially powerful in competitive programming for problems at medium-to-advanced difficulty where k is huge but the state dimension is modest. Applications include counting walks in graphs with adjacency matrices, summing powers of matrices using block-matrix tricks, and solving linear recurrences (including those with additive polynomial terms) by augmenting the state. The method is modular: once you code a solid matrix multiply and power routine, you can drop in different problem-specific transition matrices to solve a wide range of tasks.

02Intuition & Analogies

Think of a city map with intersections as nodes and one-way streets as edges. The adjacency matrix A records which roads lead where. If you drive exactly one street, A tells you your options. If you drive two streets, it’s like composing choices twice: that’s A^{2}. In general, A^{L} tells you all possible L-street routes between intersections. Now, doing L steps one by one is tedious when L is huge. Instead, we bundle steps in powers of two: 1, 2, 4, 8, ... If you can jump 8 steps with A^{8} and 32 steps with A^{32}, you can reach any L by combining a few such jumps whose lengths add up to L (binary expansion). This is exponentiation by squaring: double the step size by squaring the matrix, and include certain powers depending on the bits of L. The same idea applies beyond graphs. For sequences like Fibonacci, the next pair (F_{n+1}, F_{n}) is a linear function of (F_{n}, F_{n-1}). Package that as a 2×2 matrix, and you can jump forward F_{n} in O(log n) time. For sums like I + A + A^{2} + ... + A^{K}, there’s a neat trick: build a larger, two-floor building of states. The top floor advances by A and also “accumulates” the identity from the lower floor, so after K+1 jumps, the top-right block holds the sum of powers. For recurrences that add a polynomial in n each step, track [1, n, n^{2}, ...] in your state; updating n to n+1 obeys the binomial theorem, so it’s still a fixed linear transition. In all cases, the story is the same: linear transitions compose nicely as matrix multiplication, and powering lets you leap across many steps quickly.

03Formal Definition

Let T

\in

F^{m \times m}

be a fixed transition matrix over a field or ring (often integers modulo a prime), and let

v_{0}

\in

F^{m}

be the initial state. Define

v_{k} = T^{k}

v_{0}

. Exponentiation by squaring computes

T^{k}

using the identities: if k is even,

T^{k}

= (

T^{k /2}

)^{2}; if k is odd,

T^{k} = T

\cdot

T^{k - 1}

. For a directed graph with adjacency matrix A

\in

N^{n \times n}

, the entry (

A^{L}

)_{u,v} equals the number of directed walks of length exactly L from u to v. The geometric sum of powers

S_{K}

\sum_{i = 0}^{K}

A^{i}

can be embedded into a block matrix M =

\begin{bmatrix}

A & I \\ 0 & I

\end{bmatrix}

, for which

M^{K + 1}

\begin{bmatrix}

A^{K + 1}

S_{K}

\\ 0 & I

\end{bmatrix}

. Linear recurrences

a_{n}

\sum_{i = 1}^{r}

c_{i}

a_{n - i}

can be represented with a companion matrix C

\in

F^{r \times r}

so that [

a_{n}

a_{n - 1}

, ...,

a_{n - r + 1}

]^{

⊤

} =

C^{n - r + 1}

[

a_{r - 1}

, ...,

a_{0}

]^{

⊤

}. For non-homogeneous recurrences with polynomial additive terms,

a_{n + 1}

\sum_{i = 0}^{r - 1}

c_{i}

a_{n - i}

+ P(n) where P is a degree-d polynomial, augment the state with [1, n,

n^{2}

, ...,

n^{d}

] and use the binomial identity (n+1)^{t} =

\sum_{j = 0}^{t}

(j t)

n^{j}

to build a constant block that updates powers of n. Then the total transition remains linear and time-invariant, allowing matrix exponentiation.

04When to Use

Use matrix exponentiation when: (1) You have a small, fixed-dimension linear state transition, and you need to advance it a huge number of steps (k up to 10^{18}). Examples: computing the K-th term of a linear recurrence, simulating a DP with constant-size state, or evolving Markov-like transitions modulo a number. (2) You need to count directed walks of fixed length in a small graph (n ≤ 100–200), where adjacency power gives exact counts modulo some MOD. (3) You need sums of powers of a matrix, such as counting walks of at most K edges: the block matrix trick delivers \sum_{i=0}^{K} A^{i} in essentially the same complexity as a single power. (4) The transition depends on n only via additive polynomials (e.g., a_{n+1} = c_{1} a_{n} + c_{2} a_{n-1} + P(n)); augmenting [1, n, n^{2}, ...] keeps the transition linear and constant. (5) The problem size makes naive DP or simulation too slow but the transition dimension is modest, as common in CF 1800–2200 problems. Avoid it when the matrix dimension is large (e.g., thousands) unless it is extremely sparse and you implement sparse multiplication, or if transitions are non-linear or depend on complex history that cannot be captured by a small fixed-dimensional linear state.

⚠️Common Mistakes

Wrong identity usage: forgetting that A^{0} = I leads to off-by-one errors, especially when combining initial steps. 2) Multiplication order confusion: matrix multiplication is not commutative. Carefully follow v_{k+1} = T v_{k} and build transitions so rows encode how new states depend on old ones. 3) Overflow: intermediate products can exceed 64-bit unless you use modulo at every multiply-add. In C++, use int64 with modular reduction or int128 for safe intermediate products. 4) Mixing 0/1-based node indices: an off-by-one index in adjacency matrices breaks path counts. 5) Building the block matrix incorrectly for sums: the correct block is [[A, I], [0, I]] and you must take power K+1 to read S{K} in the top-right block. 6) Treating non-homogeneous or n-dependent recurrences without augmenting state: additive polynomials require tracking [1, n, n^{2}, ...], and coefficient-polynomial multipliers on a{n} require more elaborate augmentation (e.g., tracking n^{j} a_{n-i}). 7) Ignoring sparsity: dense O(n^{3}) multiplication is wasteful when matrices are sparse; either compress or at least skip zero entries. 8) Mis-specified base vector: for recurrences, ensure the initial state matches the power you apply (e.g., use C^{N-r+1} on the vector [a_{r-1},...,a_{0}] to get a_{N}).

Key Formulas

Walk counting by powers

(A^{L})_{u, v} = number of directed walks of length L from u to v

Explanation: Raising the adjacency matrix to the L-th power composes transitions L times. The (u,v) entry sums over all intermediate nodes to count walks.

Matrix geometric sum

S_{K} = i = 0 \sum K A^{i}

Explanation: This sum accumulates the effect of all lengths up to K. For adjacency matrices, $S_{K}$ [u,v] counts walks with at most K edges.

Block-matrix sum trick

M = [A 0 I I], M^{K + 1} = [A^{K + 1} 0 \sum_{i = 0}^{K} A^{i} I]

Explanation: Embedding A inside a 2N×2N upper-triangular block matrix makes the top-right block accumulate the geometric series. Exponentiating M once yields both $A^{K + 1}$ and the sum.

State evolution

v_{k} = T^{k} v_{0}

Explanation: If each step is a fixed linear transformation T, then after k steps the state equals T raised to k, applied to the initial state v0. This is the core of matrix exponentiation.

Companion matrix

C = c_{1} 10 ⋮ 0 c_{2} 01 ⋮ 0 \dots \dots \dots ⋱ \dots c_{r} 00 ⋮ 1

Explanation: For $a_{n}$ = $\sum_{i = 1}^{r}$ $c_{i}$ $a_{n - i}$ , this matrix advances [ $a_{n - 1}$ , ..., $a_{n - r}$ ] to [ $a_{n}$ , ..., $a_{n - r + 1}$ ]. Powering C jumps to the desired index.

Binomial shift for powers of n

(n + 1)^{t} = j = 0 \sum t (j t) n^{j}

Explanation: This identity updates the polynomial basis [1, n, $n^{2}$ , ...] when incrementing n to n+1. It lets us keep a constant transition matrix for polynomial additive terms.

Exponentiation by squaring complexity

T (m) = T (⌊ m /2 ⌋) + O (n^{3})

Explanation: Each recursive level performs one matrix multiply (or two) of cost O( $n^{3}$ ) and halves the exponent. The total time is O( $n^{3}$ $lo g$ m).

Closed form geometric sum

S_{K} = (I - A)^{- 1} (I - A^{K + 1}) if I - A is invertible

Explanation: Over fields where I−A is invertible, the matrix geometric series has a closed form analogous to scalars. In modular arithmetic it may not be usable; the block trick is robust.

Diagonalization

A^{k} = P D^{k} P^{- 1} if A = P D P^{- 1}

Explanation: If A is diagonalizable, powers are easy via eigen decomposition. This is mainly theoretical in CP; we usually avoid floating point and stick to modular arithmetic.

At-most-K walk count

S_{K} [u, v] = i = 0 \sum K (A^{i})_{u, v}

Explanation: The (u,v) entry of $S_{K}$ counts all walks from u to v whose lengths are between 0 and K. Exclude the $i = 0$ term if you do not want zero-length walks.

Complexity Analysis

Let m be the dimension of the square matrix you exponentiate. Using classical multiplication, one multiply is O(

m^{3}

) time and O(

m^{2}

) space. Exponentiation by squaring computes the power in O(log k) multiplies, so total time is O(

m^{3}

log k) and auxiliary space is O(

m^{2}

) (to store a few matrices). For adjacency-matrix walk counting,

m = N

(number of vertices). For the at-most-K sum via the 2N×2N block matrix, the dimension doubles, so the cost becomes O((2N)^{3} log K) = O(8

N^{3}

log K); this is a constant-factor overhead but the same asymptotic order. For linear recurrences of order r with a degree-d polynomial additive term, the augmented state has size r + (d+1), because we append [1, n, ...,

n^{d}

], so time is O((r+d+1)^{3} log n). Memory is dominated by storing a constant number of m×m matrices, which is O(

m^{2}

). Practically, with careful constant factors (loop ordering, skipping zeros, unrolling small r), dimensions up to 60–100 are often feasible under typical CP time limits. If matrices are sparse (e.g., average degree small), a sparse-multiply can reduce time to O(nnz · m · log k), where nnz is the average nonzeros per row, although implementation is more involved. Fast matrix multiplication (Strassen or better) can lower the exponent from 3 to

ω \approx 2.37

in theory, but is rarely beneficial in contest settings due to high constants and complexity.

Code Examples

Reusable square-matrix template with fast power (modulo) + Fibonacci demo

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 static const long long MOD = 1'000'000'007LL;
5 
6 struct Matrix {
7     int n; // dimension (n x n)
8     vector<vector<long long>> a; // entries modulo MOD
9     Matrix(int n_=0, bool ident=false) : n(n_), a(n_, vector<long long>(n_, 0)) {
10         if (ident) {
11             for (int i = 0; i < n; ++i) a[i][i] = 1;
12         }
13     }
14 };
15 
16 // Multiply two n x n matrices (naive O(n^3))
17 Matrix mul(const Matrix &A, const Matrix &B) {
18     int n = A.n;
19     Matrix C(n);
20     for (int i = 0; i < n; ++i) {
21         for (int k = 0; k < n; ++k) {
22             long long aik = A.a[i][k];
23             if (!aik) continue; // small skip for sparsity
24             for (int j = 0; j < n; ++j) {
25                 C.a[i][j] = (C.a[i][j] + aik * B.a[k][j]) % MOD;
26             }
27         }
28     }
29     return C;
30 }
31 
32 // Fast exponentiation: compute A^e
33 Matrix mpow(Matrix base, unsigned long long e) {
34     int n = base.n;
35     Matrix res(n, true); // identity
36     while (e > 0) {
37         if (e & 1ULL) res = mul(res, base);
38         base = mul(base, base);
39         e >>= 1ULL;
40     }
41     return res;
42 }
43 
44 int main() {
45     // Demo: Fibonacci via matrix exponentiation.
46     // F_0 = 0, F_1 = 1, F_{n+1} = F_n + F_{n-1}
47     // State: [F_n, F_{n-1}]^T; Transition T = [[1,1],[1,0]]
48 
49     unsigned long long n; // can be large
50     cin >> n; // input n
51 
52     if (n == 0) { cout << 0 << "\n"; return 0; }
53 
54     Matrix T(2);
55     T.a = {{1,1},{1,0}};
56 
57     // We want F_n. For n>=1, [F_n, F_{n-1}]^T = T^{n-1} [F_1, F_0]^T = T^{n-1} [1,0]^T
58     Matrix P = mpow(T, n - 1);
59 
60     // Multiply P by vector [1,0]^T: result first component is F_n
61     long long Fn = (P.a[0][0] * 1 + P.a[0][1] * 0) % MOD;
62     cout << Fn % MOD << "\n";
63     return 0;
64 }
65

This code defines a minimal square-matrix struct, naive O(n^{3}) multiplication, and exponentiation by squaring. The demo computes the n-th Fibonacci number using the classic 2×2 transition matrix. It handles very large n because the number of multiplies grows like log n.

Time: O(n^3 log e) for an n×n matrix and exponent e (here n=2).Space: O(n^2) to store a few matrices (here constant).

Count number of directed walks of exactly L edges in a graph

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 static const long long MOD = 1'000'000'007LL;
5 
6 struct Matrix {
7     int n; vector<vector<long long>> a;
8     Matrix(int n_=0, bool ident=false): n(n_), a(n_, vector<long long>(n_, 0)){
9         if (ident) for (int i=0;i<n;++i) a[i][i]=1;
10     }
11 };
12 
13 Matrix mul(const Matrix &A, const Matrix &B){
14     int n=A.n; Matrix C(n);
15     for(int i=0;i<n;++i){
16         for(int k=0;k<n;++k){
17             long long v=A.a[i][k]; if(!v) continue;
18             for(int j=0;j<n;++j){
19                 C.a[i][j]=(C.a[i][j]+v*B.a[k][j])%MOD;
20             }
21         }
22     }
23     return C;
24 }
25 
26 Matrix mpow(Matrix base, unsigned long long e){
27     Matrix res(base.n, true);
28     while(e){
29         if(e&1ULL) res=mul(res, base);
30         base=mul(base, base);
31         e>>=1ULL;
32     }
33     return res;
34 }
35 
36 int main(){
37     ios::sync_with_stdio(false);
38     cin.tie(nullptr);
39 
40     int N, M; unsigned long long L; int s, t;
41     // Input: N nodes, M edges, walk length L, start s, end t (1-indexed)
42     // Example input:
43     // 4 5 10 1 3
44     // 1 2
45     // 2 3
46     // 3 4
47     // 4 2
48     // 1 3
49     cin >> N >> M >> L >> s >> t;
50 
51     Matrix A(N);
52     for(int i=0;i<M;++i){
53         int u,v; cin >> u >> v; --u; --v;
54         A.a[u][v] = (A.a[u][v] + 1) % MOD; // allow multi-edges by incrementing
55     }
56 
57     Matrix AL = mpow(A, L);
58     cout << AL.a[s-1][t-1] % MOD << "\n";
59     return 0;
60 }
61

Build the adjacency matrix A, raise it to power L, and output the (s,t) entry. This counts the number of directed walks of exactly L edges from s to t modulo MOD. Multi-edges are handled by incrementing the adjacency entry.

Time: O(N^3 log L)Space: O(N^2)

Count walks with at most K edges via the block-matrix trick

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 static const long long MOD = 1'000'000'007LL;
5 
6 struct Matrix {
7     int n; vector<vector<long long>> a;
8     Matrix(int n_=0, bool ident=false): n(n_), a(n_, vector<long long>(n_, 0)){
9         if (ident) for(int i=0;i<n;++i) a[i][i]=1;
10     }
11 };
12 
13 Matrix mul(const Matrix &A, const Matrix &B){
14     int n=A.n; Matrix C(n);
15     for(int i=0;i<n;++i){
16         for(int k=0;k<n;++k){
17             long long v=A.a[i][k]; if(!v) continue;
18             for(int j=0;j<n;++j){ C.a[i][j] = (C.a[i][j] + v*B.a[k][j]) % MOD; }
19         }
20     }
21     return C;
22 }
23 
24 Matrix mpow(Matrix base, unsigned long long e){
25     Matrix res(base.n, true);
26     while(e){
27         if(e&1ULL) res=mul(res, base);
28         base=mul(base, base);
29         e>>=1ULL;
30     }
31     return res;
32 }
33 
34 int main(){
35     ios::sync_with_stdio(false);
36     cin.tie(nullptr);
37 
38     int N, M; unsigned long long K; int s, t;
39     // Input: N nodes, M edges, K max length, start s, end t (1-indexed)
40     cin >> N >> M >> K >> s >> t;
41 
42     Matrix A(N);
43     for(int i=0;i<M;++i){
44         int u,v; cin >> u >> v; --u; --v;
45         A.a[u][v] = (A.a[u][v] + 1) % MOD;
46     }
47 
48     // Build block matrix M = [[A, I], [0, I]] of size 2N x 2N
49     Matrix B(2*N);
50     // Top-left A
51     for(int i=0;i<N;++i) for(int j=0;j<N;++j) B.a[i][j] = A.a[i][j];
52     // Top-right I
53     for(int i=0;i<N;++i) B.a[i][N+i] = 1;
54     // Bottom-right I
55     for(int i=0;i<N;++i) B.a[N+i][N+i] = 1;
56     // Bottom-left already zeros
57 
58     // Compute B^{K+1}
59     Matrix BK1 = mpow(B, K+1);
60 
61     // Extract S_K = sum_{i=0}^{K} A^{i} from the top-right N x N block of BK1
62     long long ans = BK1.a[s-1][N + (t-1)]; // (s,t) entry in the top-right block
63 
64     // Note: This includes the i=0 term (identity), i.e., zero-length walk when s==t.
65     // If you want strictly positive lengths, subtract 1 when s==t.
66     cout << (ans % MOD + MOD) % MOD << "\n";
67     return 0;
68 }
69

We construct the 2N×2N block matrix [[A, I],[0, I]]. Its (1,2)-block of B^{K+1} equals I + A + A^{2} + ... + A^{K}. Reading the (s,t) entry of that block gives the number of walks from s to t with length at most K (including length 0).

Time: O((2N)^3 log K) = O(N^3 log K)Space: O((2N)^2)

Linear recurrence with polynomial additive term via augmented state

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 static const long long MOD = 1'000'000'007LL;
5 
6 struct Matrix {
7     int n; vector<vector<long long>> a;
8     Matrix(int n_=0, bool ident=false): n(n_), a(n_, vector<long long>(n_, 0)){
9         if (ident) for (int i=0;i<n;++i) a[i][i]=1;
10     }
11 };
12 
13 Matrix mul(const Matrix &A, const Matrix &B){
14     int n=A.n; Matrix C(n);
15     for(int i=0;i<n;++i){
16         for(int k=0;k<n;++k){ long long v=A.a[i][k]; if(!v) continue; for(int j=0;j<n;++j){ C.a[i][j]=(C.a[i][j]+v*B.a[k][j])%MOD; } }
17     }
18     return C;
19 }
20 Matrix mpow(Matrix base, unsigned long long e){ Matrix res(base.n,true); while(e){ if(e&1ULL) res=mul(res,base); base=mul(base,base); e>>=1ULL;} return res; }
21 
22 // Example recurrence:
23 // a_n = 2*a_{n-1} + 3*a_{n-2} + (5*n^2 + 7*n + 11), for n >= 2
24 // Given a_0, a_1, compute a_N for large N.
25 // We build state s_n = [a_n, a_{n-1}, 1, n, n^2]^T so that s_{n+1} = T s_n.
26 // Polynomial basis update uses binomial expansion:
27 // [1, n, n^2]^T -> [1, n+1, (n+1)^2]^T = B * [1, n, n^2]^T, with
28 // B = [[1,0,0], [1,1,0], [1,2,1]].
29 
30 int main(){
31     ios::sync_with_stdio(false);
32     cin.tie(nullptr);
33 
34     unsigned long long N; // target index
35     long long a0, a1; // initial values
36     // Example input: N a0 a1 (e.g., 10 1 2)
37     cin >> N >> a0 >> a1;
38     a0 = (a0%MOD+MOD)%MOD; a1=(a1%MOD+MOD)%MOD;
39 
40     if (N == 0ULL) { cout << a0 % MOD << "\n"; return 0; }
41     if (N == 1ULL) { cout << a1 % MOD << "\n"; return 0; }
42 
43     // Transition matrix T of size 5x5
44     Matrix T(5);
45 
46     // Row for a_{n+1} = 2*a_n + 3*a_{n-1} + 11*1 + 7*n + 5*n^2
47     T.a[0][0] = 2; // a_n coefficient
48     T.a[0][1] = 3; // a_{n-1} coefficient
49     T.a[0][2] = 11; // constant term
50     T.a[0][3] = 7;  // n term
51     T.a[0][4] = 5;  // n^2 term
52 
53     // Row for shifting a_n -> a_{n} becomes a_{n} but placed in a_{n} previous slot (i.e., becomes next a_{n-1})
54     // Next state's second component is current a_n
55     T.a[1][0] = 1;
56 
57     // Polynomial basis update block B on the lower-right 3x3
58     // [1, n, n^2] -> [1, n+1, (n+1)^2]
59     T.a[2][2] = 1;            // 1 -> 1
60     T.a[3][2] = 1; T.a[3][3] = 1; // n+1 = 1 + n
61     T.a[4][2] = 1; T.a[4][3] = 2; T.a[4][4] = 1; // (n+1)^2 = 1 + 2n + n^2
62 
63     // Build initial state s_1 = [a_1, a_0, 1, 1, 1]^T (since n=1 here)
64     vector<long long> s1 = {a1, a0, 1, 1, 1};
65 
66     // We have s_{n+1} = T s_n. To reach s_N from s_1, apply T^{N-1}.
67     Matrix P = mpow(T, N - 1);
68 
69     // Multiply P by s1
70     vector<long long> sN(5, 0);
71     for (int i = 0; i < 5; ++i){
72         __int128 acc = 0;
73         for (int j = 0; j < 5; ++j) acc += (__int128)P.a[i][j] * s1[j];
74         sN[i] = (long long)(acc % MOD);
75     }
76 
77     long long aN = sN[0] % MOD;
78     cout << (aN + MOD) % MOD << "\n";
79     return 0;
80 }
81

We solve a non-homogeneous recurrence with an additive polynomial P(n) by augmenting the state with [1, n, n^{2}]. The polynomial block updates via binomial identities, producing a constant transition matrix T. Powering T from n=1 to n=N yields a_N in the first component. This pattern generalizes: add n^{0..d} for degree-d polynomials, and extend the companion part for higher-order recurrences.

Time: O((r + d + 1)^3 log N); here r=2, d=2 so O(5^3 log N)Space: O((r + d + 1)^2); here O(25)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	static const long long MOD = 1'000'000'007LL;
5
6	struct Matrix {
7	int n; // dimension (n x n)
8	vector<vector<long long>> a; // entries modulo MOD
9	Matrix(int n_=0, bool ident=false) : n(n_), a(n_, vector<long long>(n_, 0)) {
10	if (ident) {
11	for (int i = 0; i < n; ++i) a[i][i] = 1;
12	}
13	}
14	};
15
16	// Multiply two n x n matrices (naive O(n^3))
17	Matrix mul(const Matrix &A, const Matrix &B) {
18	int n = A.n;
19	Matrix C(n);
20	for (int i = 0; i < n; ++i) {
21	for (int k = 0; k < n; ++k) {
22	long long aik = A.a[i][k];
23	if (!aik) continue; // small skip for sparsity
24	for (int j = 0; j < n; ++j) {
25	C.a[i][j] = (C.a[i][j] + aik * B.a[k][j]) % MOD;
26	}
27	}
28	}
29	return C;
30	}
31
32	// Fast exponentiation: compute A^e
33	Matrix mpow(Matrix base, unsigned long long e) {
34	int n = base.n;
35	Matrix res(n, true); // identity
36	while (e > 0) {
37	if (e & 1ULL) res = mul(res, base);
38	base = mul(base, base);
39	e >>= 1ULL;
40	}
41	return res;
42	}
43
44	int main() {
45	// Demo: Fibonacci via matrix exponentiation.
46	// F_0 = 0, F_1 = 1, F_{n+1} = F_n + F_{n-1}
47	// State: [F_n, F_{n-1}]^T; Transition T = [[1,1],[1,0]]
48
49	unsigned long long n; // can be large
50	cin >> n; // input n
51
52	if (n == 0) { cout << 0 << "\n"; return 0; }
53
54	Matrix T(2);
55	T.a = {{1,1},{1,0}};
56
57	// We want F_n. For n>=1, [F_n, F_{n-1}]^T = T^{n-1} [F_1, F_0]^T = T^{n-1} [1,0]^T
58	Matrix P = mpow(T, n - 1);
59
60	// Multiply P by vector [1,0]^T: result first component is F_n
61	long long Fn = (P.a[0][0] * 1 + P.a[0][1] * 0) % MOD;
62	cout << Fn % MOD << "\n";
63	return 0;
64	}
65