🗂️Data StructureBeginner

Array and Vector

Key Points

  • Arrays and vectors store elements contiguously, giving O(1) random access via index.
  • std::vector grows automatically and supports amortized O(1) pus, but may reallocate and invalidate pointers/iterators.
  • Use reserve(n) before many insertions to avoid repeated reallocations and to make growth O(n) overall.
  • resize(n) changes the logical size, constructing or destroying elements; reserve(n) only changes capacity.
  • For 2D data, vector<vector<T>> is easy but a flat vector with index r*m + c is usually faster and more cache-friendly.
  • Sorting and unique enable coordinate compression; map values to ranks using lowe in O(log n) per query.
  • Memory capacity typically doubles on growth, keeping , which bounds wasted space.
  • shrin_fit() is non-binding; it may reduce capacity to size, but the standard does not guarantee it.

Prerequisites

  • Big-O notation and amortized analysisTo understand O(1) random access, amortized O(1) push_back, and O(n log n) sorting in compression.
  • Basic C++ syntax, references, and RAIIVectors manage resources automatically; knowing constructors, destructors, and references helps avoid invalidation mistakes.
  • Arrays and pointer arithmeticExplains why contiguous memory enables O(1) access and how index maps to addresses.
  • Binary search and sortingCoordinate compression relies on sort and lower_bound (binary search) for rank mapping.
  • CPU caches and locality (intro)Motivates why flat 2D arrays can be faster than nested vectors.

Detailed Explanation

Tap terms for definitions

01Overview

Arrays and vectors are foundational data structures that store elements contiguously in memory. A plain array has a fixed size, while std::vector is a dynamic array that automatically resizes as you push elements. This contiguous layout gives constant-time random access: reading or writing a[i] does not depend on i. In competitive programming, vectors are the default container for sequences because they balance speed, convenience, and safety. Vectors manage their own memory, track size and capacity, and provide useful operations like push_back, pop_back, insert, erase, reserve, resize, and shrink_to_fit. When a vector runs out of space, it allocates a larger block, moves elements over, and frees the old block. This growth strategy means push_back is amortized O(1), even though a single push can occasionally cost O(n) during reallocation. For 2D grids, you can use vector<vector<T>> for simplicity or a single flat vector with index arithmetic for maximum performance and cache efficiency. Beyond storage, vectors enable common patterns like coordinate compression (discretization), where large or sparse values are mapped to dense ranks so that arrays and Fenwick/segment trees become feasible.

02Intuition & Analogies

Imagine a row of lockers (an array) where each locker has the same size and sits right next to its neighbors. If you know the locker number i, you can walk straight to it in constant time. A plain array is like renting exactly n lockers—cheap and fast—but you can’t easily add more. A std::vector is like having a flexible row of lockers managed by a helpful custodian. When you need more space, the custodian finds a bigger row somewhere else, moves all your items, and hands you the new first locker’s key. Most of the time, adding one more item is instant; occasionally, you wait while the custodian relocates everything—this is why push_back is amortized O(1). The capacity is how many lockers you’ve currently reserved; the size is how many are filled. reserve() politely warns the custodian in advance, so they prepare a big enough row once, which avoids repeated moves. For 2D data, think of a grid of seats. You can number seats by row and column, or you can give every seat a single number by counting across rows: index = row * numCols + col. The second method packs everything in one long row of lockers—fewer keys, fewer trips, and better chances that the stuff you need next is near what you just touched. That locality leads to speed due to CPU caches. Coordinate compression is like replacing long, complicated names with short ID badges so you can keep your directory compact and fast to search.

03Formal Definition

An array is a contiguous block of memory holding elements of the same type. For a base address A and element size s, the address of element i is A + i·s. In C++, std:: models a dynamic array with three core parameters: size n (the number of constructed elements), capacity C (allocated slots, ), and a contiguous storage invariant. Access by operator[] or at() is O(1). When and an insertion at the end occurs, the vector performs a reallocation: it allocates a new buffer of capacity C' (usually C' ≈ 2C), moves or copies the n elements, and then appends the new element. Under a geometric growth policy, the total cost to push N elements is O(N), yielding amortized O(1) per push. For 2D arrays, represents a list of contiguous rows, but the rows themselves may be discontiguous from each other. A flat representation uses a single of length R·M with row-major indexing , preserving one contiguous block. Coordinate compression constructs a strictly increasing array U of unique values from an original array V, and maps any value x in V to rank(x) = lowe(U.begin(), U.end(), x) - U.begin(), enabling O(log n) lookup with O(n log n) preprocessing.

04When to Use

Use std::vector whenever you need a resizable, cache-friendly sequence with fast random access and frequent end insertions. It’s ideal for arrays of known or roughly known sizes (use reserve), prefix sums, dynamic programming tables, adjacency lists (with care), and buffering results. For 2D grids or matrices where performance matters, prefer a flat vector with index arithmetic if dimensions are fixed or known; use vector<vector<T>> for jagged rows or when ease of coding outweighs performance. Coordinate compression is best when values are large, sparse, or arbitrary (like 64-bit IDs or coordinates), but you need to store counts, prefix sums, or segment tree/Fenwick tree indexes in O(n) space. Use resize to set logical size when you want elements constructed (e.g., initializing DP). Use reserve to avoid reallocations when you’ll push_back a known number of elements. Consider shrink_to_fit after building a structure if memory is tight and you won’t add more, but remember it’s non-binding and may not reduce capacity. Avoid vector when you need frequent middle insertions or deletions; a deque or list may be more appropriate, although vectors often still win due to cache effects for moderate edits.

⚠️Common Mistakes

  • Confusing reserve with resize: reserve only changes capacity; size remains the same. Writing to v[i] after reserve without resize is undefined behavior.
  • Ignoring iterator/pointer/reference invalidation on reallocation: push_back or insert may move the buffer, invalidating v.data(), &v[0], and iterators. Reacquire them after operations that might reallocate.
  • Using vector<bool> expecting a normal vector of booleans: it’s a specialized bitset-like proxy with surprising behavior. Prefer vector<char> or vector<uint8_t> for packed bits, or vector<int8_t> for logical flags when simplicity matters.
  • Overusing vector<vector<T>> for dense 2D arrays: rows are discontiguous, leading to cache misses. For performance-critical code, use a flat vector and compute indices.
  • Forgetting to reserve before many push_backs: repeated reallocations increase constant factors; a single reserve(n) can be noticeably faster.
  • Off-by-one and bounds errors: at() checks bounds; operator[] does not. Never access v[v.size()] or negative indices.
  • Assuming shrink_to_fit always frees memory or is O(1): it’s non-binding and may reallocate; plan memory under that uncertainty.
  • Mixing int with size_t, causing signed/unsigned warnings or bugs. Cast carefully or use size_t/ptrdiff_t for sizes and indices.
  • Using references to elements while pushing into the same vector: a reallocation can invalidate references; store indices or re-get references after growth.
  • Sorting without stable requirements: coordinate compression does not need stability, but if relative order within equal elements matters elsewhere, consider stable_sort.

Key Formulas

Random Access Time

Explanation: Indexing into a contiguous array or vector takes constant time because the address is computed with a fixed arithmetic formula. The time does not depend on i or n.

Element Address

Explanation: If A is the base address and s is the element size in bytes, the i-th element’s address is computed by adding i times s. This underlies O(1) random access.

Geometric Series for Amortized Push

Explanation: When capacity doubles, the total number of element moves over n pus is bounded by a geometric series. Therefore the total work is linear, giving amortized O(1) per push.

Capacity Bound Under Geometric Growth

Explanation: With multiplicative growth, capacity stays within a constant factor of size. This bounds memory overhead to a constant factor of n elements.

Row-Major Index

Explanation: For a flat vector representing an R-by-M matrix, the linear index of element (r,c) is r times the number of columns plus c. This lets you store 2D data contiguously.

Coordinate Compression Rank

Explanation: Given sorted unique values U, the rank of x is the first position where x could be inserted. This preserves order and maps values to [0..|U|-1].

Sorting Complexity

Explanation: Sorting n numbers using comparison-based algorithms like std::sort takes O(n log n) time. This dominates coordinate compression preprocessing.

Binary Search Complexity

Explanation: lowe runs in O(log n) time over a sorted range. After compression, each lookup is logarithmic in the number of unique values.

Space of Vector

Explanation: A vector storing n elements uses linear space in n, plus a constant amount of bookkeeping. Capacity may exceed size by a constant factor.

Reallocation Cost

Explanation: When the vector grows past capacity, moving/copying all n elements takes linear time. These occasional costs are averaged in amortized analysis.

Complexity Analysis

For a std::vector with n elements, random access via operator[] or at() is O(1) time and O(1) extra space, since the address of the i-th element is computed by base plus an offset. pus operates in amortized O(1) time: most insertions append into spare capacity, costing O(1), while occasional reallocations cost O(n) to move elements. Over a sequence of N pushes with geometric growth, the total work is O(N). insert or erase at arbitrary positions (not at the end) are O(n) because elements after the position must be shifted. po is O(1). reserve(k) is O(1) if the current capacity is otherwise, it performs one O(n) reallocation. resize(k) is O(n') where n' is the number of elements constructed or destroyed; for trivially constructible types with a given value, construction is linear in the change. clear is O(n) in terms of destructor calls for non-trivial types (effectively O(1) for trivial types). For 2D data with dimensions R and M, a flat vector of size R·M offers O(1) index computation and sequential access that is cache-friendly. supports O(1) access per element as well, but rows may be discontiguous, causing more cache misses during row transitions. Coordinate compression runs in O(n log n) time due to sorting, uses O(n) space for the unique array, and supports O(log n) queries via lowe; replacing lowe with an unordere can yield O(1) average queries at the cost of extra memory and losing order-based properties. Space usage for vector is O(n) elements plus at most a constant-factor overhead due to capacity; under doubling, capacity , keeping memory overhead bounded.

Code Examples

Essential std::vector: reserve vs resize, push_back, iteration
1#include <bits/stdc++.h>
2using namespace std;
3
4int main() {
5 ios::sync_with_stdio(false);
6 cin.tie(nullptr);
7
8 // Suppose we will read n numbers and build their prefix sums.
9 int n = 10;
10 vector<int> a; // size = 0, capacity = 0
11 a.reserve(n); // Ensure at least n capacity; size still 0
12
13 // Push numbers 1..n
14 for (int i = 1; i <= n; ++i) {
15 a.push_back(i); // amortized O(1)
16 }
17
18 // Show size and capacity
19 cout << "size=" << a.size() << " capacity=" << a.capacity() << "\n";
20
21 // Build prefix sums into another vector using resize (we want n elements constructed)
22 vector<long long> pref;
23 pref.resize(n, 0); // logical size becomes n, all elements initialized to 0
24 for (int i = 0; i < n; ++i) {
25 pref[i] = (i == 0 ? 0LL : pref[i-1]) + a[i];
26 }
27
28 cout << "prefix sums: ";
29 for (int i = 0; i < n; ++i) cout << pref[i] << (i+1==n?'\n':' ');
30
31 // Demonstrate at() vs operator[]
32 try {
33 cout << "a.at(3)=" << a.at(3) << "\n"; // bounds-checked
34 // cout << a.at(100) << "\n"; // would throw out_of_range
35 } catch (const out_of_range &e) {
36 cout << "out_of_range: " << e.what() << "\n";
37 }
38
39 // Erase last 3 elements using pop_back
40 for (int k = 0; k < 3; ++k) a.pop_back(); // O(1) each
41 cout << "after pop_back x3, size=" << a.size() << " capacity=" << a.capacity() << "\n";
42
43 // If memory is tight and we won't push more, we can request shrink
44 a.shrink_to_fit(); // non-binding; may reduce capacity
45 cout << "after shrink_to_fit, size=" << a.size() << " capacity=" << a.capacity() << "\n";
46
47 return 0;
48}
49

This program contrasts reserve (capacity only) with resize (changes size) and shows amortized O(1) push_back. It computes prefix sums using indexed access, demonstrates at() with bounds checking, and illustrates that shrink_to_fit may reduce capacity but is not guaranteed. pop_back reduces size without changing capacity.

Time: O(n) overall (building and scanning vectors)Space: O(n) additional space for the vectors
2D arrays: vector<vector<int>> vs flat vector with row-major indexing
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compute a simple 2D operation: sum of 4-neighbors for each cell.
5// Shows both nested vectors and a flat representation.
6
7int main() {
8 ios::sync_with_stdio(false);
9 cin.tie(nullptr);
10
11 int R = 4, C = 5;
12
13 // Approach 1: vector<vector<int>> (easy, but rows are discontiguous)
14 vector<vector<int>> grid(R, vector<int>(C, 1)); // fill with 1s
15
16 auto in_bounds = [&](int r, int c){ return 0 <= r && r < R && 0 <= c && c < C; };
17
18 vector<vector<int>> neigh_sum1(R, vector<int>(C, 0));
19 for (int r = 0; r < R; ++r) {
20 for (int c = 0; c < C; ++c) {
21 int s = 0;
22 const int dr[4] = {-1, 1, 0, 0};
23 const int dc[4] = {0, 0, -1, 1};
24 for (int k = 0; k < 4; ++k) {
25 int nr = r + dr[k], nc = c + dc[k];
26 if (in_bounds(nr, nc)) s += grid[nr][nc];
27 }
28 neigh_sum1[r][c] = s;
29 }
30 }
31
32 // Approach 2: flat vector with row-major indexing (contiguous, cache-friendly)
33 vector<int> flat(R * C, 1);
34 auto idx = [&](int r, int c){ return r * C + c; };
35
36 vector<int> neigh_sum2(R * C, 0);
37 for (int r = 0; r < R; ++r) {
38 for (int c = 0; c < C; ++c) {
39 int s = 0;
40 if (r-1 >= 0) s += flat[idx(r-1, c)];
41 if (r+1 < R) s += flat[idx(r+1, c)];
42 if (c-1 >= 0) s += flat[idx(r, c-1)];
43 if (c+1 < C) s += flat[idx(r, c+1)];
44 neigh_sum2[idx(r, c)] = s;
45 }
46 }
47
48 // Print one cell from each to show same result
49 cout << "neigh_sum1[2][3]=" << neigh_sum1[2][3] << "\n";
50 cout << "neigh_sum2[2,3]=" << neigh_sum2[idx(2,3)] << "\n";
51
52 return 0;
53}
54

Both methods compute the same neighbor-sum on a small grid. vector<vector<int>> is simpler, while a flat vector uses one contiguous block and explicit index math r*C + c, which tends to be faster for large arrays due to better cache locality.

Time: O(R*C)Space: O(R*C)
Coordinate compression (discretization) with std::vector, sort, unique, lower_bound
1#include <bits/stdc++.h>
2using namespace std;
3
4// Build compressed ranks for an array and support mapping and inverse mapping.
5
6int main() {
7 ios::sync_with_stdio(false);
8 cin.tie(nullptr);
9
10 vector<long long> a = {1000, -5, 1000, 42, 42, 7};
11
12 // 1) Build sorted unique values U
13 vector<long long> U = a;
14 sort(U.begin(), U.end());
15 U.erase(unique(U.begin(), U.end()), U.end()); // now strictly increasing
16
17 // 2) Map each element to its rank with lower_bound (binary search)
18 vector<int> rank_of(a.size());
19 for (size_t i = 0; i < a.size(); ++i) {
20 int r = int(lower_bound(U.begin(), U.end(), a[i]) - U.begin());
21 rank_of[i] = r; // 0..U.size()-1
22 }
23
24 // 3) Optionally, build inverse mapping: from rank to value is just U[r]
25
26 // Print results
27 cout << "unique values (U): ";
28 for (auto x : U) cout << x << ' '; cout << "\n";
29 cout << "ranks: ";
30 for (auto r : rank_of) cout << r << ' '; cout << "\n";
31
32 // Example query: compress and decompress a value
33 long long x = 42;
34 int rx = int(lower_bound(U.begin(), U.end(), x) - U.begin());
35 cout << "x=42 -> rank=" << rx << ", inverse(U[rank])=" << U[rx] << "\n";
36
37 // After compression, you can build Fenwick/segment trees with size = U.size()
38
39 return 0;
40}
41

This program performs coordinate compression by creating a sorted unique array U and mapping each original value to its rank via lower_bound. The mapping is order-preserving, turning large or sparse values into dense indices suitable for arrays and indexed data structures.

Time: O(n log n) due to sorting and n lower_bound callsSpace: O(n) for U and ranks
Capacity growth, reallocation, and shrink_to_fit behavior
1#include <bits/stdc++.h>
2using namespace std;
3
4int main() {
5 ios::sync_with_stdio(false);
6 cin.tie(nullptr);
7
8 vector<int> v;
9 v.reserve(1); // start small to observe growth
10
11 const int N = 32; // small demo size; grow and watch capacity and data pointer
12 const int* last_ptr = v.data();
13
14 for (int i = 0; i < N; ++i) {
15 v.push_back(i);
16 const int* cur_ptr = v.data();
17 if (cur_ptr != last_ptr) {
18 cout << "reallocated: size=" << v.size() << " capacity=" << v.capacity() << "\n";
19 last_ptr = cur_ptr;
20 }
21 }
22
23 // Demonstrate iterator invalidation
24 // WARNING: This is just to illustrate; do not use invalid iterators in practice.
25 // Acquire an iterator, then cause possible reallocation, then reacquire.
26 auto it = v.begin();
27 size_t idx = it - v.begin();
28 v.push_back(999); // may reallocate -> 'it' potentially invalid now
29 it = v.begin() + idx; // reacquire using index
30 cout << "value at original index after growth=" << *it << "\n";
31
32 // Show shrink_to_fit possibly reducing capacity
33 size_t old_cap = v.capacity();
34 v.erase(v.begin() + 5, v.end()); // keep only first 5 elements
35 v.shrink_to_fit(); // request capacity reduction
36 cout << "after shrink: size=" << v.size() << " capacity=" << v.capacity() << " (was " << old_cap << ")\n";
37
38 return 0;
39}
40

The program tracks when reallocation happens by comparing the data() pointer after each push_back. It demonstrates that iterators can be invalidated by growth and shows the correct pattern: store an index and reacquire the iterator after modifications. Finally, it requests capacity reduction with shrink_to_fit, which may or may not free memory depending on the implementation.

Time: O(N) pushes; each push is amortized O(1)Space: O(N) for the vector storage