∑MathIntermediate

Fast Exponentiation

Key Points

•
Fast exponentiation (binary exponentiation) computes $a^{n}$ using repeated squaring in O(log n) multiplications.
•
The idea is to read the binary digits of n; when a bit is 1, multiply the answer by the current power and always square the base.
•
Under a modulus m, reduce after every multiplication to keep numbers small and avoid overflow: (x · y) mod m.
•
It is the backbone of modular arithmetic tasks like modular inverse (for primes) and matrix exponentiation.
•
Iterative implementations are simple, fast, and avoid recursion limits; recursive versions mirror the math neatly.
•
Handle edge cases: n = 0 gives 1; negative n requires reciprocals (not meaningful under mod unless an inverse exists).
•
To prevent overflow for 64-bit moduli, use 128-bit temporaries (e.g., __int128) for the multiplication step.
•
The method generalizes to matrices and other associative operations: power by squaring works as long as multiplication is associative.

Prerequisites

→Binary representation of integers — Binary exponentiation processes the bits of n; understanding bit operations clarifies the algorithm.
→Modular arithmetic basics — Knowing how modulo works and why reductions preserve equivalence is essential for correctness.
→Integer overflow and data types — Preventing overflow before modulo requires choosing appropriate types (e.g., __int128).
→Associative operations and identities — Exponentiation by squaring relies on associativity and the existence of an identity element (1).
→Recursion and iteration — Binary exponentiation can be implemented both ways; understanding loops and call stacks helps.

Detailed Explanation

Tap terms for definitions

01Overview

Fast exponentiation, also called binary exponentiation or exponentiation by squaring, is a technique to compute powers a^n dramatically faster than naive repeated multiplication. Instead of multiplying a by itself n times (which is O(n)), it uses the binary representation of n to reduce the number of multiplications to O(log n). The method repeatedly squares the base and selectively multiplies it into the result when the corresponding bit of the exponent is 1. This approach is especially powerful for modular arithmetic, where we compute a^n mod m and reduce after each multiplication to keep numbers bounded. Binary exponentiation is a core tool in competitive programming, cryptography (e.g., RSA), number theory (modular inverses, Euler’s theorem), and algorithms (matrix exponentiation for linear recurrences). It can be implemented iteratively with a simple while-loop over the bits of n or recursively using divide-and-conquer (a^n = (a^{n/2})^2 for even n and a × a^{n-1} for odd n). The same idea extends to any associative multiplication, such as matrices or transformations.

02Intuition & Analogies

Imagine climbing a tall staircase with millions of steps. Naively, you would take one step at a time, which is slow (like multiplying a by itself n times). But if you could take super-steps that double in size—1 step, then 2, then 4, then 8—you could reach very high steps quickly. You would combine only the super-steps that add up exactly to your target. That is what binary exponentiation does with powers: it precomputes a, a^2, a^4, a^8, ... (the “super-steps” via squaring), then picks only the ones that sum to the desired exponent n when written in binary.

Another analogy: think of paying a bill using coins that are powers of two (1, 2, 4, 8, ...). Any amount can be paid using a subset of these coins. Similarly, any exponent n can be decomposed into a sum of powers of two, and a^n becomes a product of selected precomputed squares. In modular arithmetic, reducing after each operation is like ensuring your wallet never overflows—whenever you add or multiply, you immediately take change modulo m to keep values small and safe.

For matrices, the idea is the same: squaring a matrix corresponds to applying a transformation twice, so applying it 2^k times is just k squarings. Combining selected squarings (based on bits of n) applies the transformation exactly n times, but using only O(log n) matrix multiplications.

03Formal Definition

Let a be an element of a multiplicative monoid (e.g., integers modulo m, real numbers, or k × k matrices), and let n be a nonnegative integer with binary expansion n =

\sum_{i = 0}^{t}

b_{i}

2^i, where

b_{i}

\in

{0,1} and t =

⌊

lo g_{2}

⌋

. Then \[

a^{n}

\prod_{i = 0}^{t}

(

a^{2^{i}}

)^{

b_{i}

}. \] Exponentiation by squaring computes the sequence

p_{0} = a

p_{i + 1} = p_{i}^{2}

for

i \geq 0

, and multiplies into the accumulator r only when

b_{i} = 1

, i.e., r

\leftarrow

r ·

p_{i}

. This yields O(

lo g

n) multiplications since there are t + 1 bits. For modular exponentiation with modulus

m \geq 2

, we compute r ≡

a^{n}

(mod m) using the same process but after every multiplication or squaring we reduce modulo m:

x \leftarrow x

mod m. The correctness follows from closure and congruence properties in rings: if x ≡ y (mod m) then xz ≡ yz (mod m) and

x^{2}

≡

y^{2}

(mod m). Negative exponents require a multiplicative inverse:

a^{- n}

is defined as (

a^{- 1}

)^{n} when a has an inverse (e.g., gcd(a, m) = 1 in Z/mZ).

04When to Use

Modular powers: computing a^n mod m for large n, such as in cryptography (RSA encryption/decryption), discrete logarithm–related tasks, or programming contest problems that ask for large exponents modulo a number.
Modular inverse (prime modulus): when m is prime and a is not divisible by m, you can compute a^{-1} mod m as a^{m-2} mod m using Fermat’s Little Theorem via fast exponentiation.
Matrix exponentiation: to evaluate linear recurrences (Fibonacci, tribonacci, DP transitions) in O(k^3 log n) time where k is the matrix size (O(1) for fixed 2×2 Fibonacci).
Powering any associative operator: polynomials under multiplication modulo x^n, affine transformations, or function composition when represented as matrices.
When the exponent is huge (e.g., up to 10^{18} or larger) and naive O(n) multiplication is infeasible. Fast exponentiation cuts runtime to O(log n) and is straightforward to implement iteratively.
Situations requiring repeated exponentiation with different exponents but same base can also benefit by caching powers of two (binary lifting), though that is a related technique.

⚠️Common Mistakes

Forgetting to reduce modulo m after each multiplication/squaring, causing overflow or incorrect results. Always do base = base % m first, and reduce every product.
Overflow before modulo when m fits in 64 bits but a*b does not. Use 128-bit temporaries (e.g., (__int128)) or modular multiplication techniques to avoid overflow.
Mishandling edge cases: n = 0 should return 1 (the multiplicative identity), even for modular arithmetic; n < 0 is undefined under modulus unless a has an inverse; handle m = 1 which forces all results to 0.
Confusing iterative order: the loop should test the current least significant bit (e & 1), multiply result by base when set, then square base and shift e. Swapping steps incorrectly can alter results.
Using recursion without tail-call optimization on very large inputs in languages with small recursion limits. Prefer an iterative solution in performance-critical or deep-exponent contexts.
Assuming Fermat’s Little Theorem works for composite moduli. For non-prime moduli, use Euler’s theorem with \phi(m) only if gcd(a, m) = 1; otherwise the inverse does not exist.

Key Formulas

Binary decomposition of exponent

a^{n} = i = 0 \prod t (a^{2^{i}})^{b_{i}}, n = i = 0 \sum t b_{i} 2^{i}, b_{i} \in {0, 1}

Explanation: Express n in binary and multiply only those squared powers whose bits are 1. This shows why only O(log n) multiplications are needed.

Time complexity recurrence

T (n) = T (⌊ n /2 ⌋) + O (1)

Explanation: Each step halves the exponent and does O(1) work (a square and possibly a multiply). Solving gives T(n) = O(log n).

Modular multiplication compatibility

(x mod m) \cdot (y mod m) \equiv (x \cdot y) mod m

Explanation: You can reduce before or after multiplying without changing the residue class. This justifies reducing after each operation to avoid overflow.

Fermat’s Little Theorem

a^{p - 1} \equiv 1 (mod p) for prime p, p ∤ a

Explanation: When p is prime and a is not divisible by p, raising a to p−1 gives 1 modulo p. It underlies computing inverses as $a^{p - 2}$ mod p.

Euler’s Theorem

a^{ϕ (m)} \equiv 1 (mod m) if g cd (a, m) = 1

Explanation: For composite moduli, powers cycle with period dividing $ϕ$ (m) for units. It generalizes Fermat’s theorem to any modulus with coprime base.

Negative exponent (fields)

a^{- n} = \frac{1}{a ^{n}}

Explanation: In fields like real numbers, negative exponents mean reciprocals. Under modulus, this requires the multiplicative inverse to exist.

Bit-length of n

n has ⌊ lo g_{2} n ⌋ + 1 bits

Explanation: The loop in iterative binary exponentiation runs this many iterations. Each iteration processes one bit of n.

Matrix exponentiation by squaring

A^{n} = i = 0 \prod t (A^{2^{i}})^{b_{i}}

Explanation: The same binary decomposition applies to matrices, enabling O( $k^{3}$ log n) time using standard matrix multiplication for k×k matrices.

Complexity Analysis

Binary exponentiation reduces the exponent by half at every step. In the iterative form, each loop iteration processes one bit of n: it conditionally multiplies the accumulator (if the least significant bit is 1) and always squares the base, then shifts the exponent right by one. The number of iterations equals the bit-length of n, which is ⌊log2 n⌋ + 1. Therefore, the number of multiplications is at most about 2·log2 n (one square per bit, plus up to one multiply per bit), yielding time complexity O(log n). When applying the method to k×k matrices using standard multiplication, each multiplication is O(

k^{3}

), so the total time becomes O(

k^{3}

log n); for fixed-size matrices such as 2×2, this is effectively O(log n) with a small constant. Space complexity depends on the implementation. The iterative approach uses O(1) auxiliary space: a few variables for the result, current power, and exponent. The recursive approach has O(log n) call depth due to the divide-and-conquer structure, consuming O(log n) stack space. In modular arithmetic, all intermediate values are reduced modulo m, keeping them bounded by O(log m) bits. However, care must be taken to avoid overflow before reduction; using 128-bit temporaries for intermediate products on 64-bit systems prevents overflow when

m < 2^{6} 3

, while still preserving O(1) space usage. For matrix exponentiation, memory is O(

k^{2}

) to store matrices, constant with respect to n.

Code Examples

Iterative modular fast exponentiation with 128-bit-safe multiplication

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Safe modular multiplication for 64-bit operands using 128-bit temporaries.
5 static inline long long mul_mod(long long a, long long b, long long m) {
6     __int128 x = ( __int128 ) (a % m);
7     __int128 y = ( __int128 ) (b % m);
8     __int128 p = x * y;
9     p %= m;
10     return (long long)p;
11 }
12 
13 // Computes (a^e) mod m in O(log e) time.
14 long long mod_pow(long long a, unsigned long long e, long long m) {
15     if (m == 1) return 0;                // everything is 0 mod 1
16     a %= m;                              // reduce base first
17     if (a < 0) a += m;                   // normalize negative base
18     long long result = 1 % m;            // handles m=1 safely
19     long long base = a;
20     unsigned long long exp = e;
21     while (exp > 0) {
22         if (exp & 1ULL) {
23             result = mul_mod(result, base, m);
24         }
25         base = mul_mod(base, base, m);
26         exp >>= 1ULL;                    // shift to process next bit
27     }
28     return result;
29 }
30 
31 int main() {
32     ios::sync_with_stdio(false);
33     cin.tie(nullptr);
34 
35     long long a, m; unsigned long long n;
36     // Example: compute a^n mod m for multiple test cases
37     int T; cin >> T;
38     while (T--) {
39         cin >> a >> n >> m;
40         cout << mod_pow(a, n, m) << "\n";
41     }
42     return 0;
43 }
44

This program implements binary exponentiation modulo m. It processes the bits of n from least significant to most, multiplying the result when a bit is 1 and always squaring the base. mul_mod uses __int128 to avoid overflow before the modulo reduction, allowing moduli up to 9e18 safely on 64-bit platforms.

Time: O(log n)Space: O(1)

Fast power for real numbers (handles negative exponents)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Computes a^n for real a and integer n using binary exponentiation.
5 // Negative exponents return reciprocals (1 / a^{|n|}).
6 double fast_pow(double a, long long n) {
7     if (n == 0) return 1.0;
8     bool neg = (n < 0);
9     unsigned long long e = neg ? (unsigned long long)(-(n + 1)) + 1ULL : (unsigned long long)n;
10     double base = a;
11     double result = 1.0;
12     while (e > 0) {
13         if (e & 1ULL) result *= base;
14         base *= base;
15         e >>= 1ULL;
16     }
17     return neg ? 1.0 / result : result;
18 }
19 
20 int main() {
21     ios::sync_with_stdio(false);
22     cin.tie(nullptr);
23 
24     cout.setf(ios::fixed); cout << setprecision(10);
25     cout << fast_pow(2.0, 10) << "\n";     // 1024.0
26     cout << fast_pow(2.5, -3) << "\n";     // 1/(2.5^3)
27     cout << fast_pow(1.0001, 1000000) << "\n"; // large exponent quickly
28     return 0;
29 }
30

This example computes a^n over real numbers using the iterative binary method and supports negative exponents by returning the reciprocal. It demonstrates the same core idea without modular reduction.

Time: O(log |n|)Space: O(1)

Modular inverse under prime modulus using Fermat’s Little Theorem

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 static inline long long mul_mod(long long a, long long b, long long m) {
5     __int128 x = ( __int128 ) (a % m);
6     __int128 y = ( __int128 ) (b % m);
7     __int128 p = x * y;
8     p %= m;
9     return (long long)p;
10 }
11 
12 long long mod_pow(long long a, unsigned long long e, long long m) {
13     if (m == 1) return 0;
14     a %= m; if (a < 0) a += m;
15     long long res = 1 % m, base = a;
16     while (e) {
17         if (e & 1ULL) res = mul_mod(res, base, m);
18         base = mul_mod(base, base, m);
19         e >>= 1ULL;
20     }
21     return res;
22 }
23 
24 // Simple primality test for demonstration (sufficient for typical contest primes like 1e9+7).
25 bool is_prime(long long p) {
26     if (p < 2) return false;
27     if (p % 2 == 0) return p == 2;
28     for (long long d = 3; d * d <= p; d += 2)
29         if (p % d == 0) return false;
30     return true;
31 }
32 
33 // Computes modular inverse of a modulo prime p using a^{p-2} mod p.
34 long long mod_inverse_prime(long long a, long long p) {
35     assert(is_prime(p));                 // p must be prime
36     a %= p; if (a < 0) a += p;
37     assert(a != 0);                      // inverse does not exist if a ≡ 0 (mod p)
38     return mod_pow(a, (unsigned long long)(p - 2), p);
39 }
40 
41 int main() {
42     ios::sync_with_stdio(false);
43     cin.tie(nullptr);
44 
45     long long a = 3, p = 1'000'000'007LL;
46     long long inv = mod_inverse_prime(a, p);
47     cout << inv << "\n";                        // prints the inverse of 3 modulo 1e9+7
48     // Verify: a * inv ≡ 1 (mod p)
49     cout << (mul_mod(a, inv, p)) << "\n";      // should be 1
50     return 0;
51 }
52

Using Fermat’s Little Theorem, the modular inverse of a (mod p) for prime p is a^{p−2} mod p. Fast exponentiation computes this efficiently. The example includes a simple primality check and verifies the result.

Time: O(log p) for inversion (dominated by exponentiation)Space: O(1)

2×2 matrix exponentiation to compute Fibonacci numbers modulo m

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Mat2 {
5     long long a,b,c,d; // [[a,b],[c,d]]
6 };
7 
8 static inline long long mul_mod(long long x, long long y, long long m) {
9     __int128 X = ( __int128 ) (x % m);
10     __int128 Y = ( __int128 ) (y % m);
11     __int128 P = X * Y;
12     P %= m;
13     return (long long)P;
14 }
15 
16 Mat2 mul(const Mat2 &X, const Mat2 &Y, long long mod) {
17     Mat2 R;
18     R.a = (mul_mod(X.a, Y.a, mod) + mul_mod(X.b, Y.c, mod)) % mod;
19     R.b = (mul_mod(X.a, Y.b, mod) + mul_mod(X.b, Y.d, mod)) % mod;
20     R.c = (mul_mod(X.c, Y.a, mod) + mul_mod(X.d, Y.c, mod)) % mod;
21     R.d = (mul_mod(X.c, Y.b, mod) + mul_mod(X.d, Y.d, mod)) % mod;
22     return R;
23 }
24 
25 Mat2 mat_pow(Mat2 base, unsigned long long n, long long mod) {
26     // Identity matrix
27     Mat2 res{1 % mod, 0, 0, 1 % mod};
28     while (n > 0) {
29         if (n & 1ULL) res = mul(res, base, mod);
30         base = mul(base, base, mod);
31         n >>= 1ULL;
32     }
33     return res;
34 }
35 
36 // Fibonacci via power of Q-matrix: [[1,1],[1,0]]^n = [[F_{n+1}, F_n],[F_n, F_{n-1}]]
37 long long fib_mod(unsigned long long n, long long mod) {
38     if (n == 0) return 0 % mod;
39     Mat2 Q{1,1,1,0};
40     Mat2 P = mat_pow(Q, n - 1, mod);
41     // F_n is at position (0,0) + (0,1) for n>=1? Using Q^{n-1}, F_n = P.a
42     return (P.a % mod + mod) % mod;
43 }
44 
45 int main() {
46     ios::sync_with_stdio(false);
47     cin.tie(nullptr);
48 
49     long long mod = 1'000'000'007LL;
50     cout << fib_mod(1, mod) << "\n";    // 1
51     cout << fib_mod(10, mod) << "\n";   // 55
52     cout << fib_mod(1'000'000, mod) << "\n"; // large n, fast
53     return 0;
54 }
55

This code raises the Fibonacci Q-matrix [[1,1],[1,0]] to the (n−1)-th power using binary exponentiation. The resulting matrix’s (0,0) entry equals F_n, so we get Fibonacci numbers modulo mod in O(log n) time with constant-sized matrices.

Time: O(log n) with constant matrix size (2×2)Space: O(1) additional space

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Safe modular multiplication for 64-bit operands using 128-bit temporaries.
5	static inline long long mul_mod(long long a, long long b, long long m) {
6	__int128 x = ( __int128 ) (a % m);
7	__int128 y = ( __int128 ) (b % m);
8	__int128 p = x * y;
9	p %= m;
10	return (long long)p;
11	}
12
13	// Computes (a^e) mod m in O(log e) time.
14	long long mod_pow(long long a, unsigned long long e, long long m) {
15	if (m == 1) return 0; // everything is 0 mod 1
16	a %= m; // reduce base first
17	if (a < 0) a += m; // normalize negative base
18	long long result = 1 % m; // handles m=1 safely
19	long long base = a;
20	unsigned long long exp = e;
21	while (exp > 0) {
22	if (exp & 1ULL) {
23	result = mul_mod(result, base, m);
24	}
25	base = mul_mod(base, base, m);
26	exp >>= 1ULL; // shift to process next bit
27	}
28	return result;
29	}
30
31	int main() {
32	ios::sync_with_stdio(false);
33	cin.tie(nullptr);
34
35	long long a, m; unsigned long long n;
36	// Example: compute a^n mod m for multiple test cases
37	int T; cin >> T;
38	while (T--) {
39	cin >> a >> n >> m;
40	cout << mod_pow(a, n, m) << "\n";
41	}
42	return 0;
43	}
44

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Computes a^n for real a and integer n using binary exponentiation.
5	// Negative exponents return reciprocals (1 / a^{\|n\|}).
6	double fast_pow(double a, long long n) {
7	if (n == 0) return 1.0;
8	bool neg = (n < 0);
9	unsigned long long e = neg ? (unsigned long long)(-(n + 1)) + 1ULL : (unsigned long long)n;
10	double base = a;
11	double result = 1.0;
12	while (e > 0) {
13	if (e & 1ULL) result *= base;
14	base *= base;
15	e >>= 1ULL;
16	}
17	return neg ? 1.0 / result : result;
18	}
19
20	int main() {
21	ios::sync_with_stdio(false);
22	cin.tie(nullptr);
23
24	cout.setf(ios::fixed); cout << setprecision(10);
25	cout << fast_pow(2.0, 10) << "\n"; // 1024.0
26	cout << fast_pow(2.5, -3) << "\n"; // 1/(2.5^3)
27	cout << fast_pow(1.0001, 1000000) << "\n"; // large exponent quickly
28	return 0;
29	}
30

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct Mat2 {
5	long long a,b,c,d; // [[a,b],[c,d]]
6	};
7
8	static inline long long mul_mod(long long x, long long y, long long m) {
9	__int128 X = ( __int128 ) (x % m);
10	__int128 Y = ( __int128 ) (y % m);
11	__int128 P = X * Y;
12	P %= m;
13	return (long long)P;
14	}
15
16	Mat2 mul(const Mat2 &X, const Mat2 &Y, long long mod) {
17	Mat2 R;
18	R.a = (mul_mod(X.a, Y.a, mod) + mul_mod(X.b, Y.c, mod)) % mod;
19	R.b = (mul_mod(X.a, Y.b, mod) + mul_mod(X.b, Y.d, mod)) % mod;
20	R.c = (mul_mod(X.c, Y.a, mod) + mul_mod(X.d, Y.c, mod)) % mod;
21	R.d = (mul_mod(X.c, Y.b, mod) + mul_mod(X.d, Y.d, mod)) % mod;
22	return R;
23	}
24
25	Mat2 mat_pow(Mat2 base, unsigned long long n, long long mod) {
26	// Identity matrix
27	Mat2 res{1 % mod, 0, 0, 1 % mod};
28	while (n > 0) {
29	if (n & 1ULL) res = mul(res, base, mod);
30	base = mul(base, base, mod);
31	n >>= 1ULL;
32	}
33	return res;
34	}
35
36	// Fibonacci via power of Q-matrix: [[1,1],[1,0]]^n = [[F_{n+1}, F_n],[F_n, F_{n-1}]]
37	long long fib_mod(unsigned long long n, long long mod) {
38	if (n == 0) return 0 % mod;
39	Mat2 Q{1,1,1,0};
40	Mat2 P = mat_pow(Q, n - 1, mod);
41	// F_n is at position (0,0) + (0,1) for n>=1? Using Q^{n-1}, F_n = P.a
42	return (P.a % mod + mod) % mod;
43	}
44
45	int main() {
46	ios::sync_with_stdio(false);
47	cin.tie(nullptr);
48
49	long long mod = 1'000'000'007LL;
50	cout << fib_mod(1, mod) << "\n"; // 1
51	cout << fib_mod(10, mod) << "\n"; // 55
52	cout << fib_mod(1'000'000, mod) << "\n"; // large n, fast
53	return 0;
54	}
55