⚙️AlgorithmIntermediate

2D Prefix Sum

Key Points

•
A 2D prefix sum (also called an integral image) lets you compute the sum of any axis-aligned sub-rectangle in constant time after O(nm) preprocessing.
•
The key idea is inclusion–exclusion: sums up to the top-left are added twice, so you subtract one overlap to correct it.
•
Use 1-based indexing with a zero-padded top row and left column to eliminate boundary checks and off-by-one errors.
•
To query the sum of [r1..r2] × [c1..c2], combine four prefix values to get the answer in O(1).
•
For many rectangle updates (adding a value to all cells in a rectangle), use a 2D difference array (2D Imos) and reconstruct with a 2D prefix at the end.
•
Always use 64-bit integers (long long) if values or dimensions can be large to avoid overflow.
•
Building the prefix is O(nm) time and O(nm) space; each sum query is O(1), making it ideal for lots of queries.
•
This technique is widely used in competitive programming and image processing to speed up repeated area-sum computations.

Prerequisites

→Arrays and 2D indexing — You need to understand how to access and iterate over matrix elements.
→1D prefix sums — 2D prefix sums generalize the 1D idea; familiarity with 1D helps grasp the extension.
→Inclusion–exclusion principle — Rectangle queries rely on adding and subtracting overlapping areas correctly.
→Big-O notation — To evaluate the trade-off between preprocessing time and query speed.
→Integer overflow and data types — Prefix sums can exceed 32-bit limits; using 64-bit integers is often necessary.
→Input normalization and bounds checking — Ensures query coordinates are valid and prevents out-of-bounds access.

Detailed Explanation

Tap terms for definitions

01Overview

A 2D prefix sum is a preprocessing technique for matrices or grids that allows you to compute the sum of any rectangular subarray in constant time. The method constructs an auxiliary matrix P where P[i][j] stores the sum of all elements in the original matrix A from the top-left corner (1,1) to (i,j), inclusive. Once P is built, the sum of any rectangle can be obtained using a small combination of these stored sums through inclusion–exclusion. The overall strategy is to trade one-time preprocessing work for fast queries: you spend O(nm) time to build P for an n-by-m grid, and each subsequent rectangle-sum query costs only O(1). This yields significant speed-ups when there are many queries. The technique is known under various names: 2D prefix sums, cumulative sums, summed-area tables, or integral images. It is especially useful in competitive programming, image processing (e.g., fast box blur), heatmap analysis, and any problem requiring repeated rectangular aggregations. Variants include max/min prefix structures and extensions that support range updates via the 2D difference array (also known as the 2D Imos method), where updates are applied lazily and finalized with another 2D prefix pass.

02Intuition & Analogies

Imagine a city grid of blocks. At each intersection, there’s a small donation box with some amount of money. If you routinely need to know how much money has been collected in any rectangular neighborhood, counting coin by coin each time is slow. Instead, you keep a running ledger that tells you, for every intersection, the total money in the whole rectangle from the city’s origin (top-left) to that point. With this ledger, to find the money in a particular neighborhood (a rectangle), you look up the total up to its bottom-right corner, then subtract the totals of the regions that lie above and to the left (because they were included too), and finally add back the overlapping top-left corner region that was subtracted twice. This is the same mental math you do when figuring out areas of overlapping shapes: add the big chunk, subtract the extras, and fix the double subtraction. The 2D difference array (update trick) has a similar intuition. Think of placing four signs at a rectangle’s corners: two pluses and two minuses. If you later walk the grid and keep a running cumulative tally first across rows and then down columns, the effects of these corner signs expand to fill the whole rectangle with the intended added value. This way you don’t have to add to every cell immediately—you just leave corner markers and let a final sweep propagate the updates everywhere they belong.

03Formal Definition

Let A be an n×m matrix (n rows, m columns). Define the 2D prefix sum matrix P of size (n+1)×(m+1) with a zero-padded border such that P[0][j] = 0 and P[i][0] = 0 for all i,j. For 1 ≤

i \leq n

and 1 ≤

j \leq m

, define: P[i][j] = A[i][j] + P[i-1][j] + P[i][j-1] - P[i-1][j-1]. Then, for any rectangle with 1 ≤ r1 ≤ r2 ≤ n and 1 ≤ c1 ≤ c2 ≤ m, the sum S of elements in A over rows r1..r2 and columns c1..c2 is:

S = P [r 2] [c 2] - P [r 1 - 1] [c 2] - P [r 2] [c 1 - 1] + P [r 1 - 1] [c 1 - 1]

. This follows directly from inclusion–exclusion: P[r2][c2] includes the desired rectangle plus extra regions; subtracting the top and left prefixes removes those extras, and the top-left overlap is added back once. For range updates, the 2D difference array D is an auxiliary (n+2)×(m+2) matrix initialized to zeros. To add a value v to the rectangle [r1..r2]×[c1..c2], perform: D[r1][c1] += v, D[r1][c2+1] -= v, D[r2+1][c1] -= v, D[r2+1][c2+1] += v. After applying all updates, reconstruct the total added values by computing the 2D prefix sums over D and add them elementwise to A.

04When to Use

Use 2D prefix sums when you have many queries asking for the sum of values in axis-aligned rectangles of a static grid. Typical scenarios include competitive programming problems with large n, m (up to 10^3–10^4) and many queries (10^4–10^5), heatmap or density analysis where quick area sums are required, and image processing tasks like box blurs or integral image features where filters correspond to rectangular kernels. If you also need to perform many rectangle increment updates before answering queries, use the 2D difference (Imos) method: log all updates in O(1) each and finalize with a single O(nm) sweep, then answer queries via a prefix table. If both updates and queries interleave online (updates must be reflected immediately), 2D prefix sums alone are insufficient—you may need 2D Binary Indexed Trees (Fenwick) or 2D Segment Trees, which handle updates and queries in O(log^2 n). If memory is tight or values can overflow 32-bit integers, plan for 64-bit types and possibly coordinate compression. Choose 1-based indexing with padding to simplify boundaries and reduce bugs.

⚠️Common Mistakes

Common pitfalls include off-by-one indexing errors: mixing 0-based arrays with formulas that assume 1-based indexing leads to incorrect sums. A robust pattern is to store P with size (n+1)×(m+1) and keep row 0 and column 0 as zeros, then use 1-based indices for A when filling P. Another frequent error is forgetting the inclusion–exclusion sign pattern in queries: the formula must be + − − + in that exact arrangement; any deviation yields wrong answers. Many solutions also neglect integer overflow—prefix sums can be significantly larger than individual elements, so use 64-bit types (long long) for A, P, and intermediate results. In 2D difference arrays, placing the four corner updates incorrectly (mixing c2 with c2+1, or missing one corner) silently corrupts the final grid; always follow the +, −, −, + corner rule. Developers sometimes rebuild the prefix after every single update, losing the performance benefit; batch updates with a difference array and rebuild once. Finally, not normalizing query coordinates (ensuring r1 ≤ r2 and c1 ≤ c2) or not clamping them to valid ranges can cause out-of-bounds access or logic errors.

Key Formulas

2D Prefix Recurrence

P_{i, j} = A_{i, j} + P_{i - 1, j} + P_{i, j - 1} - P_{i - 1, j - 1}

Explanation: Each prefix sum equals the current cell plus the prefixes above and to the left, minus the double-counted overlap. This is the core relation used to build the 2D prefix table.

Rectangle Sum Query

S (r_{1} .. r_{2}, c_{1} .. c_{2}) = P_{r_{2}, c_{2}} - P_{r_{1} - 1, c_{2}} - P_{r_{2}, c_{1} - 1} + P_{r_{1} - 1, c_{1} - 1}

Explanation: The sum of a submatrix is obtained by inclusion–exclusion using four prefix values. The signs are +, −, −, + in that order.

Time Complexity

T_{build} (n, m) = O (nm), T_{query} = O (1)

Explanation: Building the prefix table touches each cell once, while each query uses a constant number of arithmetic operations regardless of n and m.

2D Difference Update Rule

D [r_{1}, c_{1}] + = v, D [r_{1}, c_{2} + 1] - = v, D [r_{2} + 1, c_{1}] - = v, D [r_{2} + 1, c_{2} + 1] + = v

Explanation: To add v to a rectangle, adjust four corners of the difference array. A subsequent 2D prefix over D spreads these values to exactly the intended rectangle.

Reconstruction After Updates

A_{i, j}^{'} = A_{i, j} + x = 1 \sum i y = 1 \sum j D_{x, y}

Explanation: After all updates are logged in D, compute its 2D prefix to get the total added value at each cell, then add it to the original A to obtain the final grid A'.

Space Complexity

Mem (P) = O (nm)

Explanation: The prefix table requires storage proportional to the number of matrix cells. Using 1-based padding adds only O(n + m) extra, which is asymptotically the same.

k-by-k Block Sum

S_{k \times k} (i, j) = P_{i + k - 1, j + k - 1} - P_{i - 1, j + k - 1} - P_{i + k - 1, j - 1} + P_{i - 1, j - 1}

Explanation: The sum of the k×k submatrix with top-left at (i,j) is computed with four prefix lookups, enabling enumeration of all such blocks in O(nm).

Complexity Analysis

Let n be the number of rows and m the number of columns. Building the 2D prefix sum table P requires a single pass over all cells. For each cell, we perform a constant number of arithmetic operations (four references and three additions/subtractions), so the build time is O(nm). The space used to store P is also O(nm); using a zero-padded extra row and column increases memory by O(n + m), which does not change the asymptotic bound. Once P is built, answering any rectangle-sum query needs exactly four table lookups and three arithmetic operations, which is O(1) time per query. Therefore, for Q queries on a static grid, the total time is O(nm + Q). For batch rectangle updates, the 2D difference array D supports each update in O(1) by modifying four corners. After applying U updates, a single 2D prefix over D reconstructs the cumulative additions in O(nm), and adding them to A takes O(nm). If you then build a prefix table for the updated grid to support queries, that adds another O(nm) pass. Hence the total time becomes O(U + nm) for updates plus O(nm + Q) if queries follow; typically summarized as O(U + nm + Q). The extra space for D is O(nm), the same order as P. For online interleaving of arbitrary updates and queries, 2D prefix sums are not sufficient in sublinear time per operation; data structures like 2D Fenwick trees achieve O(lo

g^{2}

n) per update/query but are more complex and have larger constants.

Code Examples

Build 2D Prefix Sum and Answer Rectangle Sum Queries (O(1) per query)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Build a 2D prefix sum with 1-based indexing and zero padding
5 // P[i][j] = sum of A[1..i][1..j]
6 
7 int main() {
8     ios::sync_with_stdio(false);
9     cin.tie(nullptr);
10 
11     int n, m, q;
12     // Input format:
13     // n m
14     // n lines of m integers: the grid A
15     // q (number of queries)
16     // q lines: r1 c1 r2 c2 (1-based, inclusive)
17     if (!(cin >> n >> m)) return 0;
18     vector<vector<long long>> A(n + 1, vector<long long>(m + 1, 0));
19     for (int i = 1; i <= n; ++i) {
20         for (int j = 1; j <= m; ++j) {
21             cin >> A[i][j];
22         }
23     }
24     cin >> q;
25 
26     // Build prefix table P with zero-padded row 0 and column 0
27     vector<vector<long long>> P(n + 1, vector<long long>(m + 1, 0));
28     for (int i = 1; i <= n; ++i) {
29         for (int j = 1; j <= m; ++j) {
30             P[i][j] = A[i][j] + P[i - 1][j] + P[i][j - 1] - P[i - 1][j - 1];
31         }
32     }
33 
34     auto rect_sum = [&](int r1, int c1, int r2, int c2) -> long long {
35         // Normalize coordinates (ensure r1 <= r2 and c1 <= c2)
36         if (r1 > r2) swap(r1, r2);
37         if (c1 > c2) swap(c1, c2);
38         // Clamp to valid range if needed (optional safety)
39         r1 = max(1, min(r1, n)); r2 = max(1, min(r2, n));
40         c1 = max(1, min(c1, m)); c2 = max(1, min(c2, m));
41         // Inclusion–exclusion using the prefix table
42         return P[r2][c2] - P[r1 - 1][c2] - P[r2][c1 - 1] + P[r1 - 1][c1 - 1];
43     };
44 
45     while (q--) {
46         int r1, c1, r2, c2;
47         cin >> r1 >> c1 >> r2 >> c2;
48         cout << rect_sum(r1, c1, r2, c2) << '\n';
49     }
50     return 0;
51 }
52

This program reads an n×m grid, constructs a zero-padded 2D prefix sum table P in O(nm), and answers each rectangle sum query in O(1) time using the inclusion–exclusion formula. Using 1-based indexing and padding avoids boundary checks for r1−1 and c1−1.

Time: O(nm + q)Space: O(nm)

2D Difference Array (Imos) for Batch Rectangle Updates, then Query with 2D Prefix

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Apply many rectangle additions in O(1) each using a 2D difference array.
5 // After all updates, reconstruct the final grid and build a 2D prefix for queries.
6 
7 int main() {
8     ios::sync_with_stdio(false);
9     cin.tie(nullptr);
10 
11     int n, m; 
12     if (!(cin >> n >> m)) return 0;
13 
14     // Read initial grid A (can be all zeros if desired)
15     vector<vector<long long>> A(n + 1, vector<long long>(m + 1, 0));
16     for (int i = 1; i <= n; ++i) {
17         for (int j = 1; j <= m; ++j) {
18             cin >> A[i][j];
19         }
20     }
21 
22     int u; // number of updates
23     cin >> u;
24 
25     // Difference array with extra padding to allow r2+1, c2+1 updates safely
26     vector<vector<long long>> D(n + 2, vector<long long>(m + 2, 0));
27 
28     // Each update: r1 c1 r2 c2 v (add v to all A[r1..r2][c1..c2])
29     for (int k = 0; k < u; ++k) {
30         int r1, c1, r2, c2; long long v;
31         cin >> r1 >> c1 >> r2 >> c2 >> v;
32         if (r1 > r2) swap(r1, r2);
33         if (c1 > c2) swap(c1, c2);
34         // Corner updates: + - - +
35         D[r1][c1] += v;
36         D[r1][c2 + 1] -= v;
37         D[r2 + 1][c1] -= v;
38         D[r2 + 1][c2 + 1] += v;
39     }
40 
41     // Reconstruct the total added value at each cell by 2D prefix over D
42     for (int i = 1; i <= n; ++i) {
43         for (int j = 1; j <= m; ++j) {
44             D[i][j] = D[i][j] + D[i - 1][j] + D[i][j - 1] - D[i - 1][j - 1];
45         }
46     }
47 
48     // Final grid after updates: A' = A + D (D now holds cumulative additions)
49     vector<vector<long long>> B(n + 1, vector<long long>(m + 1, 0));
50     for (int i = 1; i <= n; ++i) {
51         for (int j = 1; j <= m; ++j) {
52             B[i][j] = A[i][j] + D[i][j];
53         }
54     }
55 
56     // Build prefix table on the updated grid B
57     vector<vector<long long>> P(n + 1, vector<long long>(m + 1, 0));
58     for (int i = 1; i <= n; ++i) {
59         for (int j = 1; j <= m; ++j) {
60             P[i][j] = B[i][j] + P[i - 1][j] + P[i][j - 1] - P[i - 1][j - 1];
61         }
62     }
63 
64     int q; // number of queries on the updated grid
65     cin >> q;
66 
67     auto rect_sum = [&](int r1, int c1, int r2, int c2) -> long long {
68         if (r1 > r2) swap(r1, r2);
69         if (c1 > c2) swap(c1, c2);
70         r1 = max(1, min(r1, n)); r2 = max(1, min(r2, n));
71         c1 = max(1, min(c1, m)); c2 = max(1, min(c2, m));
72         return P[r2][c2] - P[r1 - 1][c2] - P[r2][c1 - 1] + P[r1 - 1][c1 - 1];
73     };
74 
75     while (q--) {
76         int r1, c1, r2, c2;
77         cin >> r1 >> c1 >> r2 >> c2;
78         cout << rect_sum(r1, c1, r2, c2) << '\n';
79     }
80 
81     return 0;
82 }
83

We batch-apply U rectangle updates using the 2D difference array D in O(1) per update. A single 2D prefix over D reconstructs the cumulative additions. We add this to A to get B and build a prefix table P over B to answer subsequent queries in O(1).

Time: O(nm + u + nm + q) = O(nm + u + q)Space: O(nm)

Compute All k×k Submatrix Sums Using 2D Prefix in O(nm)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Given an n x m grid and a block size k, compute the sum of every k x k submatrix.
5 // Outputs an (n-k+1) x (m-k+1) matrix of sums.
6 
7 int main() {
8     ios::sync_with_stdio(false);
9     cin.tie(nullptr);
10 
11     int n, m, k;
12     if (!(cin >> n >> m >> k)) return 0;
13     vector<vector<long long>> A(n + 1, vector<long long>(m + 1, 0));
14     for (int i = 1; i <= n; ++i) {
15         for (int j = 1; j <= m; ++j) cin >> A[i][j];
16     }
17 
18     // Build 2D prefix
19     vector<vector<long long>> P(n + 1, vector<long long>(m + 1, 0));
20     for (int i = 1; i <= n; ++i) {
21         for (int j = 1; j <= m; ++j) {
22             P[i][j] = A[i][j] + P[i - 1][j] + P[i][j - 1] - P[i - 1][j - 1];
23         }
24     }
25 
26     int out_n = n - k + 1, out_m = m - k + 1;
27     if (out_n <= 0 || out_m <= 0) {
28         // No k x k submatrix fits
29         cout << 0 << '\n';
30         return 0;
31     }
32 
33     vector<vector<long long>> S(out_n + 1, vector<long long>(out_m + 1, 0));
34 
35     auto rect_sum = [&](int r1, int c1, int r2, int c2) -> long long {
36         return P[r2][c2] - P[r1 - 1][c2] - P[r2][c1 - 1] + P[r1 - 1][c1 - 1];
37     };
38 
39     for (int i = 1; i <= out_n; ++i) {
40         for (int j = 1; j <= out_m; ++j) {
41             int r1 = i, c1 = j;
42             int r2 = i + k - 1, c2 = j + k - 1;
43             S[i][j] = rect_sum(r1, c1, r2, c2);
44         }
45     }
46 
47     // Output the (n-k+1) x (m-k+1) matrix of kxk sums
48     for (int i = 1; i <= out_n; ++i) {
49         for (int j = 1; j <= out_m; ++j) {
50             if (j > 1) cout << ' ';
51             cout << S[i][j];
52         }
53         cout << '\n';
54     }
55 
56     return 0;
57 }
58

After building the 2D prefix table, each k×k submatrix sum is a constant-time four-corner lookup. Enumerating all top-left positions yields O(nm) total time, which is optimal for producing all outputs.

Time: O(nm)Space: O(nm)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Build a 2D prefix sum with 1-based indexing and zero padding
5	// P[i][j] = sum of A[1..i][1..j]
6
7	int main() {
8	ios::sync_with_stdio(false);
9	cin.tie(nullptr);
10
11	int n, m, q;
12	// Input format:
13	// n m
14	// n lines of m integers: the grid A
15	// q (number of queries)
16	// q lines: r1 c1 r2 c2 (1-based, inclusive)
17	if (!(cin >> n >> m)) return 0;
18	vector<vector<long long>> A(n + 1, vector<long long>(m + 1, 0));
19	for (int i = 1; i <= n; ++i) {
20	for (int j = 1; j <= m; ++j) {
21	cin >> A[i][j];
22	}
23	}
24	cin >> q;
25
26	// Build prefix table P with zero-padded row 0 and column 0
27	vector<vector<long long>> P(n + 1, vector<long long>(m + 1, 0));
28	for (int i = 1; i <= n; ++i) {
29	for (int j = 1; j <= m; ++j) {
30	P[i][j] = A[i][j] + P[i - 1][j] + P[i][j - 1] - P[i - 1][j - 1];
31	}
32	}
33
34	auto rect_sum = [&](int r1, int c1, int r2, int c2) -> long long {
35	// Normalize coordinates (ensure r1 <= r2 and c1 <= c2)
36	if (r1 > r2) swap(r1, r2);
37	if (c1 > c2) swap(c1, c2);
38	// Clamp to valid range if needed (optional safety)
39	r1 = max(1, min(r1, n)); r2 = max(1, min(r2, n));
40	c1 = max(1, min(c1, m)); c2 = max(1, min(c2, m));
41	// Inclusion–exclusion using the prefix table
42	return P[r2][c2] - P[r1 - 1][c2] - P[r2][c1 - 1] + P[r1 - 1][c1 - 1];
43	};
44
45	while (q--) {
46	int r1, c1, r2, c2;
47	cin >> r1 >> c1 >> r2 >> c2;
48	cout << rect_sum(r1, c1, r2, c2) << '\n';
49	}
50	return 0;
51	}
52

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Apply many rectangle additions in O(1) each using a 2D difference array.
5	// After all updates, reconstruct the final grid and build a 2D prefix for queries.
6
7	int main() {
8	ios::sync_with_stdio(false);
9	cin.tie(nullptr);
10
11	int n, m;
12	if (!(cin >> n >> m)) return 0;
13
14	// Read initial grid A (can be all zeros if desired)
15	vector<vector<long long>> A(n + 1, vector<long long>(m + 1, 0));
16	for (int i = 1; i <= n; ++i) {
17	for (int j = 1; j <= m; ++j) {
18	cin >> A[i][j];
19	}
20	}
21
22	int u; // number of updates
23	cin >> u;
24
25	// Difference array with extra padding to allow r2+1, c2+1 updates safely
26	vector<vector<long long>> D(n + 2, vector<long long>(m + 2, 0));
27
28	// Each update: r1 c1 r2 c2 v (add v to all A[r1..r2][c1..c2])
29	for (int k = 0; k < u; ++k) {
30	int r1, c1, r2, c2; long long v;
31	cin >> r1 >> c1 >> r2 >> c2 >> v;
32	if (r1 > r2) swap(r1, r2);
33	if (c1 > c2) swap(c1, c2);
34	// Corner updates: + - - +
35	D[r1][c1] += v;
36	D[r1][c2 + 1] -= v;
37	D[r2 + 1][c1] -= v;
38	D[r2 + 1][c2 + 1] += v;
39	}
40
41	// Reconstruct the total added value at each cell by 2D prefix over D
42	for (int i = 1; i <= n; ++i) {
43	for (int j = 1; j <= m; ++j) {
44	D[i][j] = D[i][j] + D[i - 1][j] + D[i][j - 1] - D[i - 1][j - 1];
45	}
46	}
47
48	// Final grid after updates: A' = A + D (D now holds cumulative additions)
49	vector<vector<long long>> B(n + 1, vector<long long>(m + 1, 0));
50	for (int i = 1; i <= n; ++i) {
51	for (int j = 1; j <= m; ++j) {
52	B[i][j] = A[i][j] + D[i][j];
53	}
54	}
55
56	// Build prefix table on the updated grid B
57	vector<vector<long long>> P(n + 1, vector<long long>(m + 1, 0));
58	for (int i = 1; i <= n; ++i) {
59	for (int j = 1; j <= m; ++j) {
60	P[i][j] = B[i][j] + P[i - 1][j] + P[i][j - 1] - P[i - 1][j - 1];
61	}
62	}
63
64	int q; // number of queries on the updated grid
65	cin >> q;
66
67	auto rect_sum = [&](int r1, int c1, int r2, int c2) -> long long {
68	if (r1 > r2) swap(r1, r2);
69	if (c1 > c2) swap(c1, c2);
70	r1 = max(1, min(r1, n)); r2 = max(1, min(r2, n));
71	c1 = max(1, min(c1, m)); c2 = max(1, min(c2, m));
72	return P[r2][c2] - P[r1 - 1][c2] - P[r2][c1 - 1] + P[r1 - 1][c1 - 1];
73	};
74
75	while (q--) {
76	int r1, c1, r2, c2;
77	cin >> r1 >> c1 >> r2 >> c2;
78	cout << rect_sum(r1, c1, r2, c2) << '\n';
79	}
80
81	return 0;
82	}
83

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Given an n x m grid and a block size k, compute the sum of every k x k submatrix.
5	// Outputs an (n-k+1) x (m-k+1) matrix of sums.
6
7	int main() {
8	ios::sync_with_stdio(false);
9	cin.tie(nullptr);
10
11	int n, m, k;
12	if (!(cin >> n >> m >> k)) return 0;
13	vector<vector<long long>> A(n + 1, vector<long long>(m + 1, 0));
14	for (int i = 1; i <= n; ++i) {
15	for (int j = 1; j <= m; ++j) cin >> A[i][j];
16	}
17
18	// Build 2D prefix
19	vector<vector<long long>> P(n + 1, vector<long long>(m + 1, 0));
20	for (int i = 1; i <= n; ++i) {
21	for (int j = 1; j <= m; ++j) {
22	P[i][j] = A[i][j] + P[i - 1][j] + P[i][j - 1] - P[i - 1][j - 1];
23	}
24	}
25
26	int out_n = n - k + 1, out_m = m - k + 1;
27	if (out_n <= 0 \|\| out_m <= 0) {
28	// No k x k submatrix fits
29	cout << 0 << '\n';
30	return 0;
31	}
32
33	vector<vector<long long>> S(out_n + 1, vector<long long>(out_m + 1, 0));
34
35	auto rect_sum = [&](int r1, int c1, int r2, int c2) -> long long {
36	return P[r2][c2] - P[r1 - 1][c2] - P[r2][c1 - 1] + P[r1 - 1][c1 - 1];
37	};
38
39	for (int i = 1; i <= out_n; ++i) {
40	for (int j = 1; j <= out_m; ++j) {
41	int r1 = i, c1 = j;
42	int r2 = i + k - 1, c2 = j + k - 1;
43	S[i][j] = rect_sum(r1, c1, r2, c2);
44	}
45	}
46
47	// Output the (n-k+1) x (m-k+1) matrix of kxk sums
48	for (int i = 1; i <= out_n; ++i) {
49	for (int j = 1; j <= out_m; ++j) {
50	if (j > 1) cout << ' ';
51	cout << S[i][j];
52	}
53	cout << '\n';
54	}
55
56	return 0;
57	}
58