⚙️AlgorithmAdvanced

Divide and Conquer DP Optimization

Key Points

•
Divide and Conquer DP optimization speeds up DP transitions of the form dp[i][j] = min over k of dp[i-1][k] + C(k, j) when the optimal k is monotone in j.
•
The key property needed is monotonicity of argmin: opt[i][j] ≤ opt[i][j+1], which often follows from the Monge (quadrangle) inequality on the cost C.
•
We compute each DP row with a recursive routine: solve the midpoint j, search k only in [optL, optR], then recurse on the left and right halves with narrowed ranges.
•
This reduces per-row work from O(n²) to O(n log n) when C can be evaluated in O(1), yielding O(k n log n) for k rows.
•
Typical applications include partition DP (k segments), 1D k-means with squared distances, text justification penalties, and batching/scheduling with convex costs.
•
Implementation details matter: inclusive segment conventions, correct k-range (k < j), valid base cases, and careful handling of infinities and types.
•
You can precompute prefix sums (and sometimes prefix sums of squares) to make C(k, j) evaluable in O(1).
•
If you can further prove a stronger structure (Knuth optimization), complexity can drop to O(k n); otherwise, D&C gives a reliable O(k n log n).

Prerequisites

→Prefix sums — Enable O(1) segment cost evaluation C(k, j) from cumulative sums (and sums of squares).
→Dynamic programming basics — Understand DP recurrences, base cases, and how rows depend on previous rows.
→Big-O notation — Analyze and compare O(k n^2) vs O(k n log n) complexities.
→Divide and conquer recursion — Follow the midpoint-first strategy and correctly recurse on halves with narrowed ranges.
→Convexity and Monge arrays (high level) — Know sufficient conditions that imply monotone argmins for correctness.
→Integer and floating-point precision — Choose correct numeric types for large costs or variance-like costs.
→Argmin/argmax concepts — Work with optimal indices and their monotonicity properties.

Detailed Explanation

Tap terms for definitions

01Overview

Divide and Conquer (D&C) DP optimization is a technique that accelerates a broad class of dynamic programs that look for a best split point. The classical form is dp[i][j] = min_{k < j} (dp[i-1][k] + C(k, j)), where C(k, j) is a cost for making the last segment end at j after splitting at k. A naive computation scans all k for every j and row i, costing O(k n²). D&C optimization exploits a monotonicity pattern of the optimal split point (argmin) with respect to j: as j grows, the best k never moves backward. Under this property, we can compute each DP row by a divide-and-conquer recursion that solves the middle column first, discovers its optimal k, then solves the left and right halves while shrinking their k-search intervals accordingly.

02Intuition & Analogies

Imagine you are placing k walls along a hallway of n rooms to minimize some total cost. For each position j you want to know the best previous wall k. If the best k for room j=50 is 31, it’s unlikely that when you move to room j=51 the best k would suddenly jump far left to, say, 12 (for well-behaved costs). More realistically, the best k slides right or stays put as j increases. This sliding behavior is monotonicity of the argmin. Now think of a detective searching for a suspect’s location along a street. If the suspect’s last known location at the center was house 31, then to the left half of the street you only need to check houses up to 31, and to the right half you only need to check from 31 onward. By first solving the midpoint, we cut the search space for both halves. Repeating this divide-and-conquer strategy means each part of the street inherits a narrower, more relevant search zone. In DP terms, for a given row i, we solve j at the mid, find its best k in [optL, optR], then recurse left with [optL, bestK] and right with [bestK, optR]. The end result is that instead of doing n separate, full scans over k for all j, we do many smaller scans whose total work adds up to about n log n, assuming C is fast to evaluate.

03Formal Definition

We consider DP of the form dp[i][j] =

min_{0 \leq k < j}

\{ dp[i-1][k] + C(k, j) \} for

i = 1

..K and

j = 1

..n, with base cases such as dp[0][0] = 0 and dp[0][

j > 0]

= +

\infty

(or a problem-specific variant). Define the argmin (an optimal split index) as opt[i][j]

\in

ar g min_{0 \leq k < j}

\{ dp[i-1][k] + C(k, j) \}. The D&C DP optimization requires the monotone-argmin property: opt[i][j]

\leq

opt[i][j+1] for all relevant i and j. A sufficient condition (though not necessary) for this monotonicity is that the cost C defines a totally monotone/Monge array in the sense of the quadrangle inequality: for all a

\leq

\leq

\leq

d, C(a, c) + C(b, d)

\leq

C(a, d) + C(b, c). The algorithm computes each row i with a recursive routine solve(L, R, optL, optR): let mid =

⌊

(L+R)/2

⌋

, find the best k in [optL, min(optR, mid-1)] minimizing dp[i-1][k] + C(k, mid), store dp[i][mid] and bestK, then recursively solve the left half (L..mid-1) with [optL, bestK] and the right half (mid+1..R) with [bestK, optR].

04When to Use

Use D&C DP optimization when your DP is a partition-type recurrence with an additive structure and a split index k, and you can argue or empirically verify that opt[i][j] is nondecreasing in j for each fixed i. Common scenarios: (1) k-partitioning an array into contiguous segments with convex/Monge costs (e.g., squared sum penalties, convex slack penalties, batching with convex setup + processing costs). (2) 1D clustering (k-means with squared Euclidean distance on a sorted line), where segment cost uses prefix sums of x and x². (3) Text justification/word wrap with a convex penalty of unused space per line (e.g., cube or square), where the best previous break tends to move right as the line end moves right. (4) Some scheduling/queueing DPs with convex delay/holding costs that meet the Monge condition. If you can further show Knuth’s stronger quadrangle inequalities, you may reduce to O(k n), but if that fails while monotonicity still holds, D&C is the safe, general O(k n log n) option. Always ensure C(k, j) can be computed in O(1) or O(\log n) via preprocessing (prefix sums, sparse tables), otherwise D&C’s gains may be negated.

⚠️Common Mistakes

Off-by-one errors in segment definitions: if C(k, j) models the cost of segment (k+1..j), ensure the cost function and prefix sums are consistent with inclusive indices. Forgetting that k < j is common. - Wrong search interval propagation: after computing bestK at mid, the left recursion must use [optL, bestK] and the right recursion [bestK, optR]. Swapping or shrinking incorrectly can break correctness or performance. - Missing or incorrect base cases: initialize dp[0][0] = 0 and dp[0][j>0] = +\infty (or problem-specific). Also clamp the candidate range to [optL, min(optR, mid-1)]. - Assuming monotonicity without checking: if opt[i][j] is not nondecreasing, D&C can produce wrong answers. Either prove Monge/quadrangle properties or empirically verify monotonicity on small instances. - Cost not O(1): failing to precompute prefix sums (and sums of squares for quadratic costs) yields O(n) cost per query, canceling the optimization. - Integer overflow and precision: large costs may need 64-bit integers; costs involving means often require floating point (long double) or rational arithmetic. - Stack depth/limits: recursion depth is O(log n), but in competitive environments ensure tail recursion does not exceed limits or switch to an explicit stack if needed.

Key Formulas

Partition DP Recurrence

d p [i] [j] = 0 \leq k < j min (d p [i - 1] [k] + C (k, j))

Explanation: The best cost to cover the first j items with i groups equals the minimum over all previous cut positions k of the cost up to k plus the cost of the final segment (k+1..j).

Monotone Argmin

o pt [i] [j] \leq o pt [i] [j + 1]

Explanation: The index k that minimizes the transition for column j does not decrease when we move to column j+1. This allows shrinking k-search ranges during divide and conquer.

Quadrangle (Monge) Inequality

C (a, c) + C (b, d) \leq C (a, d) + C (b, c) (a \leq b \leq c \leq d)

Explanation: If the cost obeys this inequality, the induced cost matrix is Monge and argmins are monotone. It’s a common sufficient condition to justify D&C DP optimization.

Row Recursion Cost (Informal)

T (n) = 2 T (n /2) + O (r)

Explanation: At each recursion node we solve a midpoint and scan a candidate range of size r. Summed over levels with monotone argmin, total per-row work becomes O(n $lo g$ n).

Range Sum via Prefix Sums

sum (s, e) = P [e] - P [s - 1]

Explanation: Precomputing prefix sums P lets us get the sum over a segment in O(1). This is crucial to make C(k, j) fast to evaluate.

Squared-Sum Segment Cost

C_{sqsum} (k, j) = (P [j] - P [k])^{2}

Explanation: If the penalty for a segment is the square of its sum, prefix sums give an O(1) evaluation. Such convex penalties often yield monotonicity.

1D k-means Segment Cost

C_{kmeans} (s, e) = t = s \sum e x_{t}^{2} - \frac{( \sum _{t = s}^{e} x _{t} ) ^{2}}{e - s + 1}

Explanation: For squared Euclidean distance to the segment mean, this O(1) formula (using prefix sums of x and x²) gives the optimal within-segment variance cost.

Time Complexity per k Rows

O (k n lo g n)

Explanation: With D&C and O(1) cost evaluation, each of the k rows takes O(n $lo g$ n), compared to O( $n^{2}$ ) naively, for a total of O(k n $lo g$ n).

Space Complexity (Two Rows)

O (n)

Explanation: Since each row i depends only on row i-1, we can maintain two O(n)-sized arrays for memory efficiency.

Recursion Depth

depth \leq ⌈ lo g_{2} n ⌉

Explanation: The divide-and-conquer over columns halves the interval each time, so the recursion stack height is logarithmic in n.

Complexity Analysis

Consider one DP row i. The divide-and-conquer routine recurses on column intervals [L, R]. At each node, it computes the midpoint mid and scans candidate k in [optL, min(optR, mid−1)]. Monotone argmin ensures that the best k at mid splits the admissible range: the left child inherits [optL, bestK], the right child inherits [bestK, optR]. For many convex/Monge costs, the total length of candidate ranges summed across all nodes at a fixed recursion depth is O(n); with log n depths, this leads to O(n log n) candidate inspections per row. If computing C(k, j) takes O(1) (using prefix sums, etc.), each inspection is O(1), so a single row costs O(n log n), and k rows cost O(k n log n). In contrast, the naive DP scans all k for every j, costing O(k

n^{2}

). Space-wise, because each row depends only on the previous row, we can store two arrays of size O(n), leading to O(n) auxiliary memory (plus whatever is needed for cost preprocessing, typically O(n)). The recursion stack is O(log n). If C(k, j) is more expensive (e.g., O(log n)), multiply accordingly. Note that some problems permit an even tighter O(k n) bound (e.g., Knuth optimization) when a stronger structure on argmins holds, but the D&C method targets the general monotone-argmin setting. Finally, constants are favorable: inner loops are tight, and precomputation (prefix sums and sometimes sums of squares) is linear-time and memory-light, which is appealing for competitive programming constraints in the 2000–2400 range.

Code Examples

Generic D&C DP for k-partition with squared-sum segment cost

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // We solve: dp[i][j] = min_{0 <= k < j} dp[i-1][k] + (sum(k+1..j))^2
5 // Monotone argmin typically holds for this convex segment penalty.
6 
7 static const long long INF = (long long)4e18;
8 
9 // Compute one DP row using divide & conquer with monotone argmin.
10 template <class T, class Cost>
11 void compute_row(int L, int R, int optL, int optR,
12                  const vector<T>& dpPrev, vector<T>& dpCur, const Cost& cost) {
13     if (L > R) return;
14     int mid = (L + R) >> 1;
15     pair<T,int> best = {numeric_limits<T>::max(), -1};
16 
17     int start = optL;
18     int end = min(optR, mid - 1); // enforce k < mid
19     for (int k = start; k <= end; ++k) {
20         // segment is (k+1 .. mid)
21         T cand = dpPrev[k] + cost(k + 1, mid);
22         if (cand < best.first) best = {cand, k};
23     }
24 
25     dpCur[mid] = best.first;
26     int bestK = (best.second == -1 ? optL : best.second); // safety
27 
28     // Left half: k cannot exceed bestK
29     compute_row<T, Cost>(L, mid - 1, optL, bestK, dpPrev, dpCur, cost);
30     // Right half: k cannot be less than bestK
31     compute_row<T, Cost>(mid + 1, R, bestK, optR, dpPrev, dpCur, cost);
32 }
33 
34 int main() {
35     ios::sync_with_stdio(false);
36     cin.tie(nullptr);
37 
38     int n, K;
39     // Input: n K, then array a[1..n]
40     // Example: 8 3 \n 1 3 -2 4 2 -1 2 1
41     if (!(cin >> n >> K)) {
42         cerr << "Usage: n K followed by n integers\n";
43         return 0;
44     }
45     vector<long long> a(n + 1);
46     for (int i = 1; i <= n; ++i) cin >> a[i];
47 
48     // Prefix sums for O(1) range sums
49     vector<long long> P(n + 1, 0);
50     for (int i = 1; i <= n; ++i) P[i] = P[i - 1] + a[i];
51 
52     auto segSum = [&](int s, int e) -> long long {
53         return P[e] - P[s - 1]; // inclusive
54     };
55     auto cost = [&](int s, int e) -> long long {
56         long long S = segSum(s, e);
57         return S * S; // squared-sum segment cost
58     };
59 
60     vector<long long> dpPrev(n + 1, INF), dpCur(n + 1, INF);
61     // Base: using 0 groups to cover 0 items is 0 cost, otherwise impossible
62     dpPrev[0] = 0;
63     for (int j = 1; j <= n; ++j) dpPrev[j] = INF;
64 
65     for (int i = 1; i <= K; ++i) {
66         // Compute row i using D&C; valid j are 1..n
67         // The k-candidates for j are in [0..j-1]. Start with opt range [0..n-1].
68         fill(dpCur.begin(), dpCur.end(), INF);
69         compute_row<long long>(1, n, 0, n - 1, dpPrev, dpCur, cost);
70         dpPrev.swap(dpCur);
71     }
72 
73     long long ans = dpPrev[n];
74     cout << ans << "\n";
75     return 0;
76 }
77

This program partitions an array into K contiguous segments to minimize the sum of squared segment sums. It precomputes prefix sums so C(k, j) = (sum of a[k+1..j])² is O(1). For each row i, compute_row uses divide and conquer: it finds the optimal k for the midpoint j, then recurses on left and right halves with narrowed opt ranges. The monotone-argmin property holds for many convex/Monge segment penalties like squared sums.

Time: O(K n log n)Space: O(n)

1D k-means (squared Euclidean) via D&C DP

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // 1D k-means on sorted points x[1..n], squared Euclidean distance to cluster mean.
5 // dp[g][j] = min_{k<j} dp[g-1][k] + cost(k+1, j)
6 // cost(s,e) = sum_{t=s}^e (x_t - mean)^2 = S2 - S1^2 / len
7 // Monotone argmin typically holds (Monge structure) for squared distances in 1D.
8 
9 static const long double INF = 1e300L;
10 
11 template <class T, class Cost>
12 void compute_row(int L, int R, int optL, int optR,
13                  const vector<T>& dpPrev, vector<T>& dpCur, const Cost& cost) {
14     if (L > R) return;
15     int mid = (L + R) >> 1;
16     pair<T,int> best = {numeric_limits<T>::max(), -1};
17 
18     int start = optL;
19     int end = min(optR, mid - 1); // k < mid
20     for (int k = start; k <= end; ++k) {
21         T cand = dpPrev[k] + cost(k + 1, mid);
22         if (cand < best.first) best = {cand, k};
23     }
24 
25     dpCur[mid] = best.first;
26     int bestK = (best.second == -1 ? optL : best.second);
27 
28     compute_row<T, Cost>(L, mid - 1, optL, bestK, dpPrev, dpCur, cost);
29     compute_row<T, Cost>(mid + 1, R, bestK, optR, dpPrev, dpCur, cost);
30 }
31 
32 int main() {
33     ios::sync_with_stdio(false);
34     cin.tie(nullptr);
35 
36     int n, K;
37     // Input: n K, then n real positions (assumed sorted; if not, sort first)
38     if (!(cin >> n >> K)) {
39         cerr << "Usage: n K followed by n positions (sorted)\n";
40         return 0;
41     }
42     vector<long double> x(n + 1);
43     for (int i = 1; i <= n; ++i) cin >> x[i];
44 
45     // If not sorted, uncomment:
46     // sort(x.begin() + 1, x.end());
47 
48     // Prefix sums of x and x^2 for O(1) segment variance
49     vector<long double> S1(n + 1, 0), S2(n + 1, 0);
50     for (int i = 1; i <= n; ++i) {
51         S1[i] = S1[i - 1] + x[i];
52         S2[i] = S2[i - 1] + x[i] * x[i];
53     }
54     auto cost = [&](int s, int e) -> long double {
55         long double sum = S1[e] - S1[s - 1];
56         long double sq  = S2[e] - S2[s - 1];
57         int len = e - s + 1;
58         // variance * len = sum of squared distances to mean
59         return sq - (sum * sum) / (long double)len;
60     };
61 
62     vector<long double> dpPrev(n + 1, INF), dpCur(n + 1, INF);
63     dpPrev[0] = 0.0L; // base
64 
65     for (int g = 1; g <= K; ++g) {
66         fill(dpCur.begin(), dpCur.end(), INF);
67         compute_row<long double>(1, n, 0, n - 1, dpPrev, dpCur, cost);
68         dpPrev.swap(dpCur);
69     }
70 
71     cout.setf(std::ios::fixed); cout << setprecision(10) << (double)dpPrev[n] << "\n";
72     return 0;
73 }
74

This program computes the optimal k-clustering of 1D points under squared Euclidean distance. The segment cost is within-cluster variance times cluster size, computed in O(1) via prefix sums of x and x². The D&C routine computes each DP row in O(n log n), giving a total O(K n log n) time. Use long double to reduce precision error; sort the input points if needed.

Time: O(K n log n)Space: O(n)

Memory-lean reusable D&C row with monotonicity check helper (debug tool)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // A small toolkit: (1) reusable D&C row routine (two-row memory),
5 // (2) optional monotonicity checker for small n to validate correctness during development.
6 
7 static const long long INF = (long long)4e18;
8 
9 template <class T, class Cost>
10 void compute_row_dc(int L, int R, int optL, int optR,
11                     const vector<T>& dpPrev, vector<T>& dpCur, const Cost& cost) {
12     if (L > R) return;
13     int mid = (L + R) >> 1;
14     pair<T,int> best = {numeric_limits<T>::max(), -1};
15     int start = optL;
16     int end = min(optR, mid - 1);
17     for (int k = start; k <= end; ++k) {
18         T cand = dpPrev[k] + cost(k + 1, mid);
19         if (cand < best.first) best = {cand, k};
20     }
21     dpCur[mid] = best.first;
22     int bestK = (best.second == -1 ? optL : best.second);
23     compute_row_dc<T, Cost>(L, mid - 1, optL, bestK, dpPrev, dpCur, cost);
24     compute_row_dc<T, Cost>(mid + 1, R, bestK, optR, dpPrev, dpCur, cost);
25 }
26 
27 // Brute-force DP (O(n^2)) for a single row and monotonicity check of argmins.
28 template <class T, class Cost>
29 bool check_monotone_argmin(int n, const vector<T>& dpPrev, const Cost& cost) {
30     vector<T> dp(n + 1, numeric_limits<T>::max());
31     vector<int> opt(n + 1, -1);
32     dp[0] = dpPrev[0]; // not actually used; we'll compute dp[j] for j>=1
33     for (int j = 1; j <= n; ++j) {
34         T best = numeric_limits<T>::max();
35         int who = -1;
36         for (int k = 0; k < j; ++k) {
37             T cand = dpPrev[k] + cost(k + 1, j);
38             if (cand < best) best = cand, who = k;
39         }
40         dp[j] = best;
41         opt[j] = who;
42     }
43     for (int j = 1; j < n; ++j) if (opt[j] > opt[j + 1]) return false;
44     return true;
45 }
46 
47 int main() {
48     ios::sync_with_stdio(false);
49     cin.tie(nullptr);
50 
51     int n = 12, K = 3;
52     vector<long long> a = {0, 3, -1, 2, 1, -2, 4, 0, 2, -1, 3, -2, 1}; // 1..n
53 
54     vector<long long> P(n + 1, 0);
55     for (int i = 1; i <= n; ++i) P[i] = P[i - 1] + a[i];
56     auto cost = [&](int s, int e) -> long long {
57         long long S = P[e] - P[s - 1];
58         return S * S; // convex segment penalty
59     };
60 
61     vector<long long> dpPrev(n + 1, INF), dpCur(n + 1, INF);
62     dpPrev[0] = 0;
63 
64     // Optional: quick sanity check on monotonicity for the first row
65     if (!check_monotone_argmin<long long>(n, dpPrev, cost)) {
66         cerr << "Warning: argmin may not be monotone; D&C could be invalid.\n";
67     }
68 
69     for (int g = 1; g <= K; ++g) {
70         fill(dpCur.begin(), dpCur.end(), INF);
71         compute_row_dc<long long>(1, n, 0, n - 1, dpPrev, dpCur, cost);
72         dpPrev.swap(dpCur);
73     }
74 
75     cout << dpPrev[n] << "\n";
76     return 0;
77 }
78

This snippet provides a reusable D&C row routine plus a small brute-force checker to validate the monotone-argmin property for the current cost on small instances. This helps catch invalid uses early. It demonstrates two-row memory and standard D&C wiring with inclusive segment costs.

Time: O(K n log n) for the main DP; the checker is O(n^2) when usedSpace: O(n)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// We solve: dp[i][j] = min_{0 <= k < j} dp[i-1][k] + (sum(k+1..j))^2
5	// Monotone argmin typically holds for this convex segment penalty.
6
7	static const long long INF = (long long)4e18;
8
9	// Compute one DP row using divide & conquer with monotone argmin.
10	template <class T, class Cost>
11	void compute_row(int L, int R, int optL, int optR,
12	const vector<T>& dpPrev, vector<T>& dpCur, const Cost& cost) {
13	if (L > R) return;
14	int mid = (L + R) >> 1;
15	pair<T,int> best = {numeric_limits<T>::max(), -1};
16
17	int start = optL;
18	int end = min(optR, mid - 1); // enforce k < mid
19	for (int k = start; k <= end; ++k) {
20	// segment is (k+1 .. mid)
21	T cand = dpPrev[k] + cost(k + 1, mid);
22	if (cand < best.first) best = {cand, k};
23	}
24
25	dpCur[mid] = best.first;
26	int bestK = (best.second == -1 ? optL : best.second); // safety
27
28	// Left half: k cannot exceed bestK
29	compute_row<T, Cost>(L, mid - 1, optL, bestK, dpPrev, dpCur, cost);
30	// Right half: k cannot be less than bestK
31	compute_row<T, Cost>(mid + 1, R, bestK, optR, dpPrev, dpCur, cost);
32	}
33
34	int main() {
35	ios::sync_with_stdio(false);
36	cin.tie(nullptr);
37
38	int n, K;
39	// Input: n K, then array a[1..n]
40	// Example: 8 3 \n 1 3 -2 4 2 -1 2 1
41	if (!(cin >> n >> K)) {
42	cerr << "Usage: n K followed by n integers\n";
43	return 0;
44	}
45	vector<long long> a(n + 1);
46	for (int i = 1; i <= n; ++i) cin >> a[i];
47
48	// Prefix sums for O(1) range sums
49	vector<long long> P(n + 1, 0);
50	for (int i = 1; i <= n; ++i) P[i] = P[i - 1] + a[i];
51
52	auto segSum = [&](int s, int e) -> long long {
53	return P[e] - P[s - 1]; // inclusive
54	};
55	auto cost = [&](int s, int e) -> long long {
56	long long S = segSum(s, e);
57	return S * S; // squared-sum segment cost
58	};
59
60	vector<long long> dpPrev(n + 1, INF), dpCur(n + 1, INF);
61	// Base: using 0 groups to cover 0 items is 0 cost, otherwise impossible
62	dpPrev[0] = 0;
63	for (int j = 1; j <= n; ++j) dpPrev[j] = INF;
64
65	for (int i = 1; i <= K; ++i) {
66	// Compute row i using D&C; valid j are 1..n
67	// The k-candidates for j are in [0..j-1]. Start with opt range [0..n-1].
68	fill(dpCur.begin(), dpCur.end(), INF);
69	compute_row<long long>(1, n, 0, n - 1, dpPrev, dpCur, cost);
70	dpPrev.swap(dpCur);
71	}
72
73	long long ans = dpPrev[n];
74	cout << ans << "\n";
75	return 0;
76	}
77

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// 1D k-means on sorted points x[1..n], squared Euclidean distance to cluster mean.
5	// dp[g][j] = min_{k<j} dp[g-1][k] + cost(k+1, j)
6	// cost(s,e) = sum_{t=s}^e (x_t - mean)^2 = S2 - S1^2 / len
7	// Monotone argmin typically holds (Monge structure) for squared distances in 1D.
8
9	static const long double INF = 1e300L;
10
11	template <class T, class Cost>
12	void compute_row(int L, int R, int optL, int optR,
13	const vector<T>& dpPrev, vector<T>& dpCur, const Cost& cost) {
14	if (L > R) return;
15	int mid = (L + R) >> 1;
16	pair<T,int> best = {numeric_limits<T>::max(), -1};
17
18	int start = optL;
19	int end = min(optR, mid - 1); // k < mid
20	for (int k = start; k <= end; ++k) {
21	T cand = dpPrev[k] + cost(k + 1, mid);
22	if (cand < best.first) best = {cand, k};
23	}
24
25	dpCur[mid] = best.first;
26	int bestK = (best.second == -1 ? optL : best.second);
27
28	compute_row<T, Cost>(L, mid - 1, optL, bestK, dpPrev, dpCur, cost);
29	compute_row<T, Cost>(mid + 1, R, bestK, optR, dpPrev, dpCur, cost);
30	}
31
32	int main() {
33	ios::sync_with_stdio(false);
34	cin.tie(nullptr);
35
36	int n, K;
37	// Input: n K, then n real positions (assumed sorted; if not, sort first)
38	if (!(cin >> n >> K)) {
39	cerr << "Usage: n K followed by n positions (sorted)\n";
40	return 0;
41	}
42	vector<long double> x(n + 1);
43	for (int i = 1; i <= n; ++i) cin >> x[i];
44
45	// If not sorted, uncomment:
46	// sort(x.begin() + 1, x.end());
47
48	// Prefix sums of x and x^2 for O(1) segment variance
49	vector<long double> S1(n + 1, 0), S2(n + 1, 0);
50	for (int i = 1; i <= n; ++i) {
51	S1[i] = S1[i - 1] + x[i];
52	S2[i] = S2[i - 1] + x[i] * x[i];
53	}
54	auto cost = [&](int s, int e) -> long double {
55	long double sum = S1[e] - S1[s - 1];
56	long double sq = S2[e] - S2[s - 1];
57	int len = e - s + 1;
58	// variance * len = sum of squared distances to mean
59	return sq - (sum * sum) / (long double)len;
60	};
61
62	vector<long double> dpPrev(n + 1, INF), dpCur(n + 1, INF);
63	dpPrev[0] = 0.0L; // base
64
65	for (int g = 1; g <= K; ++g) {
66	fill(dpCur.begin(), dpCur.end(), INF);
67	compute_row<long double>(1, n, 0, n - 1, dpPrev, dpCur, cost);
68	dpPrev.swap(dpCur);
69	}
70
71	cout.setf(std::ios::fixed); cout << setprecision(10) << (double)dpPrev[n] << "\n";
72	return 0;
73	}
74

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// A small toolkit: (1) reusable D&C row routine (two-row memory),
5	// (2) optional monotonicity checker for small n to validate correctness during development.
6
7	static const long long INF = (long long)4e18;
8
9	template <class T, class Cost>
10	void compute_row_dc(int L, int R, int optL, int optR,
11	const vector<T>& dpPrev, vector<T>& dpCur, const Cost& cost) {
12	if (L > R) return;
13	int mid = (L + R) >> 1;
14	pair<T,int> best = {numeric_limits<T>::max(), -1};
15	int start = optL;
16	int end = min(optR, mid - 1);
17	for (int k = start; k <= end; ++k) {
18	T cand = dpPrev[k] + cost(k + 1, mid);
19	if (cand < best.first) best = {cand, k};
20	}
21	dpCur[mid] = best.first;
22	int bestK = (best.second == -1 ? optL : best.second);
23	compute_row_dc<T, Cost>(L, mid - 1, optL, bestK, dpPrev, dpCur, cost);
24	compute_row_dc<T, Cost>(mid + 1, R, bestK, optR, dpPrev, dpCur, cost);
25	}
26
27	// Brute-force DP (O(n^2)) for a single row and monotonicity check of argmins.
28	template <class T, class Cost>
29	bool check_monotone_argmin(int n, const vector<T>& dpPrev, const Cost& cost) {
30	vector<T> dp(n + 1, numeric_limits<T>::max());
31	vector<int> opt(n + 1, -1);
32	dp[0] = dpPrev[0]; // not actually used; we'll compute dp[j] for j>=1
33	for (int j = 1; j <= n; ++j) {
34	T best = numeric_limits<T>::max();
35	int who = -1;
36	for (int k = 0; k < j; ++k) {
37	T cand = dpPrev[k] + cost(k + 1, j);
38	if (cand < best) best = cand, who = k;
39	}
40	dp[j] = best;
41	opt[j] = who;
42	}
43	for (int j = 1; j < n; ++j) if (opt[j] > opt[j + 1]) return false;
44	return true;
45	}
46
47	int main() {
48	ios::sync_with_stdio(false);
49	cin.tie(nullptr);
50
51	int n = 12, K = 3;
52	vector<long long> a = {0, 3, -1, 2, 1, -2, 4, 0, 2, -1, 3, -2, 1}; // 1..n
53
54	vector<long long> P(n + 1, 0);
55	for (int i = 1; i <= n; ++i) P[i] = P[i - 1] + a[i];
56	auto cost = [&](int s, int e) -> long long {
57	long long S = P[e] - P[s - 1];
58	return S * S; // convex segment penalty
59	};
60
61	vector<long long> dpPrev(n + 1, INF), dpCur(n + 1, INF);
62	dpPrev[0] = 0;
63
64	// Optional: quick sanity check on monotonicity for the first row
65	if (!check_monotone_argmin<long long>(n, dpPrev, cost)) {
66	cerr << "Warning: argmin may not be monotone; D&C could be invalid.\n";
67	}
68
69	for (int g = 1; g <= K; ++g) {
70	fill(dpCur.begin(), dpCur.end(), INF);
71	compute_row_dc<long long>(1, n, 0, n - 1, dpPrev, dpCur, cost);
72	dpPrev.swap(dpCur);
73	}
74
75	cout << dpPrev[n] << "\n";
76	return 0;
77	}
78