📚TheoryIntermediate

Probability Theory

Key Points

•
Probability theory formalizes uncertainty using a sample space, events, and a probability measure that obeys clear axioms.
•
Random variables are measurable functions that map outcomes to numbers, enabling us to compute expectations and variances.
•
Bayes’ theorem lets us update beliefs with data, forming the core of many machine learning methods.
•
Independence and conditional probability describe how events and variables relate, affecting how we factor and compute probabilities.
•
The law of large numbers explains why averages stabilize, and the central limit theorem explains why many sums look Gaussian.
•
Common distributions (Bernoulli, Binomial, Poisson, Gaussian, Exponential, Gamma, Beta) model diverse phenomena from counts to waiting times.
•
Monte Carlo simulation in C++ using <random> is a practical way to estimate probabilities and expectations when formulas are hard.
•
Careful attention to assumptions (independence, proper conditioning, finite variance) avoids common pitfalls in modeling and inference.

Prerequisites

→Set theory and functions — Events are sets; random variables are functions from outcomes to numbers.
→Calculus (integration and differentiation) — Expectations for continuous variables are integrals; densities and MGFs use calculus.
→Sequences and limits — LLN and CLT involve convergence concepts and limit behavior.
→Linear algebra (vectors, covariance matrices) — Multivariate normals, covariance/correlation, and Gaussian models use linear algebra.
→Basic statistics — Interpretation of mean, variance, confidence/credible intervals relies on statistical concepts.
→C++ programming basics — Implementing simulations requires familiarity with syntax, loops, and standard libraries.
→Pseudorandom number generation — Monte Carlo requires proper RNG engines and distributions in C++.
→Combinatorics — Counting arguments derive PMFs for discrete distributions like Binomial.

Detailed Explanation

Tap terms for definitions

01Overview

Probability theory is the mathematics of uncertainty. It gives us a language to describe random phenomena—like coin flips, weather, or noisy sensors—and rules to compute meaningful quantities like the likelihood of an event, the average outcome, and how much results vary. At its core is the Kolmogorov axiomatic framework: we define a sample space of possible outcomes, a collection of events we can talk about (a sigma-algebra), and a probability measure that assigns numbers to events in a consistent way. On top of this foundation, random variables turn outcomes into numbers so that we can compute expectations, variances, and other statistics. Key tools include Bayes’ theorem for updating beliefs, independence for factorizing probabilities, and limit theorems (law of large numbers and central limit theorem) that explain why averages stabilize and why Gaussian curves appear so often. In machine learning, probability is the backbone: loss functions, generative models, Bayesian inference, and even optimization under uncertainty rely on probabilistic thinking. Practically, when formulas get messy, simulation (Monte Carlo) provides approximate answers, and modern C++ offers robust facilities for random number generation and sampling from standard distributions.

02Intuition & Analogies

Imagine you’re trying to predict if it will rain tomorrow. You don’t know for sure, but you can quantify your uncertainty—maybe a 30% chance. Probability is like a bookkeeping system for uncertainty. The sample space is the list of all possible tomorrows (rain, no rain); events are statements you can test ("it rains"); and the probability measure is a rule that tells you how strongly you believe each event will happen. Random variables are like sensors attached to outcomes: they look at what happened and output a number (1 if it rains, 0 otherwise; or how many millimeters fall). Expectation is a long-run average—if you repeated tomorrow many times in parallel universes, the average of the random variable’s outputs is its expectation. Variance measures how wiggly those outputs are around the average. Bayes’ theorem is an update rule: you start with a prior belief (chance of rain), observe evidence (dark clouds), and produce a posterior belief (now you think rain is more likely). Independence is like two coin flips that don’t influence each other; conditional probability is like asking, "given that clouds are dark, how likely is rain?" The law of large numbers is the reason averages settle down—flip a fair coin thousands of times and the fraction of heads approaches 0.5. The central limit theorem explains why sums of many small, independent effects often look bell-shaped, which is why Gaussian noise appears everywhere—from physics to sensor readings in robotics.

03Formal Definition

A probability space is a triple (

Ω

F

, P) where

Ω

is the sample space,

F

is a

σ

-algebra of subsets of

Ω

(closed under complementation and countable unions), and P:

F

\to

[0,1] is a probability measure satisfying Kolmogorov’s axioms: P(

Ω

)=1, P(

\emptyset

)=0, and countable additivity for disjoint events. A random variable X is a measurable function X:

Ω

\to

R

, meaning \{

ω

\in

Ω

: X(

ω

)

\leq

x\}

\in

F

for all real x. The distribution of X is the pushforward measure

P_{X}

(B) = P(X

\in

B). For discrete X with probability mass function p(x), expectation is E[X] =

\sum_{x}

x\,p(x); for continuous X with density f, E[X] =

\int_{- \infty}^{\infty}

x f(x)\,dx. Variance is

Var

(X) = E[(X -

μ

)^2] with

μ

=E[X]. Conditional probability is P(A

∣

B) = P(A

\cap

B)/P(B) when P(B)>0, and independence means P(A

\cap

B)=P(A)P(B). Bayes’ theorem states P(A

∣

B) = P(B

∣

A)P(A)/P(B). Conditional expectation E[X

∣

G

] generalizes conditioning on events to conditioning on a sub-

σ

-algebra

G

(or on a random variable), and satisfies the tower property E[E[X

∣

G

]] = E[X]. The law of large numbers asserts convergence of sample means to the true mean, while the central limit theorem asserts normalized sums converge in distribution to a standard normal.

04When to Use

Use probability theory whenever outcomes are uncertain or subject to noise. In software engineering and simulation, model failure rates, latency spikes, or randomized algorithms using Bernoulli, Exponential, or Gaussian distributions. In data science and machine learning, apply Bayes’ theorem for posterior inference, Naive Bayes for classification, and probabilistic models (e.g., Gaussian mixtures) for clustering. For A/B testing, use Binomial and Beta distributions to reason about click-through rates and compute credible intervals. In queuing and reliability, use Poisson processes for counts and Exponential/Gamma distributions for waiting times. When exact formulas are hard or unavailable, use Monte Carlo in C++: sample from <random> distributions to estimate expectations E[g(X)] or probabilities P(A) by averaging indicators. Rely on the law of large numbers to justify convergence of these estimates, and use the central limit theorem to build confidence intervals around them. Use independence assumptions to factor complex joint distributions into simpler pieces; when independence is false, use conditional probabilities and covariance to capture dependencies. Finally, reach for conjugate priors (e.g., Beta-Binomial, Gamma-Poisson) to obtain closed-form Bayesian updates.

⚠️Common Mistakes

Confusing P(A\mid B) with P(B\mid A). Bayes’ theorem shows they are linked by the prior odds; swapping them without adjustment leads to base-rate fallacies. Avoid this by explicitly writing P(A\mid B) = P(B\mid A)P(A)/P(B). - Assuming independence without justification. Many signals in real data are correlated; incorrectly factoring P(X,Y) as P(X)P(Y) can severely bias estimates. Check correlations or design experiments that enforce independence. - Treating densities as probabilities. For continuous X, f(x) can exceed 1 and P(X = x) = 0; only integrals over intervals yield probabilities. Always integrate densities over sets to get probabilities. - Ignoring variance and uncertainty. Reporting only point estimates (like an average) can be misleading; include variability (variance, standard error) and, when possible, confidence or credible intervals. - Misusing the central limit theorem. CLT requires independence (or weak dependence) and finite variance; heavy-tailed distributions may violate these, delaying convergence or changing the limiting law. Verify conditions or use robust methods. - Double counting in the law of total probability. Ensure {A_i} form a partition (disjoint and covering the space) before summing P(B\mid A_i)P(A_i). - Numerical instability and sampling bias. Poor RNG seeding or small sample sizes can produce misleading Monte Carlo results; use high-quality engines (mt19937), sufficient samples, and variance reduction techniques when possible.

Key Formulas

Kolmogorov Axioms (Normalization)

P (Ω) = 1, P (\emptyset) = 0

Explanation: The whole space has probability 1 and the empty set has probability 0. These anchor the scale of probability.

Countable Additivity

P (i = 1 ⋃ \infty A_{i}) = i = 1 \sum \infty P (A_{i}) for disjoint A_{i}

Explanation: The probability of a countable union of disjoint events equals the sum of their probabilities. This extends finite additivity to infinite collections.

Complement Rule

P (A^{c}) = 1 - P (A)

Explanation: The probability that A does not happen is one minus the probability that it does. Useful for computing tail probabilities.

Inclusion–Exclusion (Two Sets)

P (A \cup B) = P (A) + P (B) - P (A \cap B)

Explanation: To avoid double counting overlap, subtract the intersection once. This generalizes to more sets.

Conditional Probability

P (A ∣ B) = \frac{P ( A \cap B )}{P ( B )}

Explanation: Defines the probability of A given that B occurs. Requires P(B)>0.

Bayes’ Theorem

P (A ∣ B) = \frac{P ( B ∣ A ) P ( A )}{P ( B )}

Explanation: Updates prior P(A) with likelihood P(B|A) to produce posterior P(A|B). The denominator P(B) normalizes over all causes of B.

Law of Total Probability

P (B) = i \sum P (B ∣ A_{i}) P (A_{i})

Explanation: If { $A_{i}$ } forms a partition, the overall probability of B is a weighted sum over the conditional probabilities in each partition cell.

Independence (Events)

P (A \cap B) = P (A) P (B)

Explanation: A and B are independent if knowing one does not change the probability of the other. For random variables, joint densities factor similarly.

Expectation

E [X] = x \sum x p (x) or E [X] = \int_{- \infty}^{\infty} x f (x) d x

Explanation: Expectation is the long-run average of a random variable. Use a sum for discrete variables and an integral for continuous ones.

Variance via Moments

Var (X) = E [X^{2}] - (E [X])^{2}

Explanation: Variance equals the second moment minus the square of the first moment. It measures spread around the mean.

Covariance

Cov (X, Y) = E [X Y] - E [X] E [Y]

Explanation: Covariance measures the linear dependence between X and Y. Zero covariance does not imply independence in general.

LOTUS

E [g (X)] = x \sum g (x) p (x) or E [g (X)] = \int g (x) f (x) d x

Explanation: The Law of the Unconscious Statistician lets you compute the expectation of a function of X without explicitly finding the distribution of g(X).

Chebyshev’s Inequality

P (∣ X - μ ∣ \geq kσ) \leq \frac{1}{k ^{2}}

Explanation: Regardless of distribution (with finite variance), the probability of being k standard deviations from the mean is at most 1/ $k^{2}$ . Useful for conservative bounds.

Markov’s Inequality

P (X \geq a) \leq \frac{E [ X ]}{a} (X \geq 0)

Explanation: For nonnegative X, the probability of exceeding a threshold is at most the mean divided by that threshold. A simple but general tail bound.

Law of Large Numbers (Strong)

\overline{X}_{n} = \frac{1}{n} i = 1 \sum n X_{i} a.s. μ

Explanation: The sample mean of iid variables with mean $μ$ converges almost surely to $μ$ . This justifies Monte Carlo averaging.

Central Limit Theorem

n \frac{X _{n} - μ}{σ} d N (0, 1)

Explanation: Properly standardized sums of iid variables converge in distribution to a standard normal. Enables approximate confidence intervals.

Moment Generating Function

M_{X} (t) = E [e^{tX}]

Explanation: The MGF, when it exists near 0, uniquely determines the distribution and encodes all moments via derivatives at 0.

Convolution (Continuous)

f_{X + Y} (z) = \int_{- \infty}^{\infty} f_{X} (x) f_{Y} (z - x) d x

Explanation: The density of a sum of independent continuous variables is the convolution of their densities. Discrete analog uses sums.

Binomial PMF

P (X = k) = (k n) p^{k} (1 - p)^{n - k}

Explanation: Probability of k successes in n independent Bernoulli trials with success probability p. Models counts in fixed trials.

Poisson PMF

P (X = k) = e^{- λ} \frac{λ ^{k}}{k !}

Explanation: Counts events in a fixed interval when events occur independently at rate $λ$ . Often used for rare events.

Normal PDF

f (x) = \frac{1}{σ 2 π} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}})

Explanation: The bell-shaped density with mean $μ$ and variance $σ^{2}$ . Central in statistics and signal processing.

Exponential PDF

f (x) = λ e^{- λ x} 1_{{x \geq 0}}

Explanation: Models waiting times with memoryless property at rate $λ$ . Mean and variance are 1/ $λ$ and 1/ $λ^{2}$ .

Complexity Analysis

Computing probabilities and expectations analytically can range from constant time (closed forms) to expensive numerical integration. In practice, Monte Carlo methods dominate when formulas are intractable: sampling N iid values and averaging an indicator or function costs O(N) time and O(1)–O(N) space depending on whether you store samples. Each variate generation from C++’

s < r an d o m >

engines (e.g., mt19937) is amortized O(1). Estimators like

\overset{μ}{^}_{N}

\frac{1}{N} \sum

X_{i}

) have variance

Var

(g(X))/N, so to reduce error by a factor of 10 you need 100× more samples. For tail probabilities, naive Monte Carlo can be inefficient because events are rare; variance reduction (importance sampling, stratification) improves effective sample complexity. Computing PMFs/CDFs for discrete convolutions (sums of independent integer-valued variables) is O(

n^{2}

) naively but can be reduced to O(n

lo g

n) using FFT-based convolution. In Bayesian updating with a simple finite partition (law of total probability), time is O(k) for k hypotheses; Naive Bayes classification over d features and n classes is O(dn) per example when feature likelihoods are precomputed. Building histograms over K bins is O(N+K) time and O(K) space. For CLT demonstrations that aggregate T trials of n samples each, time is O(Tn) and space can stay O(1) beyond histogram bins. Numerical stability matters: summing many floating-point values benefits from Kahan summation (O(N)) to reduce round-off error without changing asymptotic complexity.

Code Examples

Monte Carlo estimation of expectations and probabilities for Normal(0,1)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 int main() {
5     ios::sync_with_stdio(false);
6     cin.tie(nullptr);
7 
8     // Seed a high-quality Mersenne Twister engine
9     random_device rd;
10     mt19937 gen(rd());
11 
12     // Standard normal distribution N(0,1)
13     normal_distribution<double> normal01(0.0, 1.0);
14 
15     const size_t N = 200000; // number of samples
16 
17     // Accumulators for mean and second moment (for variance)
18     long double sum = 0.0L;
19     long double sumsq = 0.0L;
20 
21     // Estimate probability P(|X| <= 1)
22     size_t count_in = 0;
23 
24     for (size_t i = 0; i < N; ++i) {
25         double x = normal01(gen);
26         sum += x;
27         sumsq += x * x;
28         if (fabs(x) <= 1.0) ++count_in;
29     }
30 
31     long double mean = sum / N;
32     long double var = sumsq / N - mean * mean; // Var(X) = E[X^2] - (E[X])^2
33     long double p_interval = static_cast<long double>(count_in) / N;
34 
35     // Theoretical values for comparison
36     // For N(0,1): E[X]=0, Var(X)=1, and P(|X| <= 1) = erf(1/sqrt(2))
37     long double theo_mean = 0.0L;
38     long double theo_var = 1.0L;
39     long double theo_p_interval = erf(1.0L / sqrtl(2.0L));
40 
41     cout.setf(std::ios::fixed); cout << setprecision(6);
42     cout << "Samples: " << N << "\n";
43     cout << "Estimated mean       : " << mean << " (theory: " << theo_mean << ")\n";
44     cout << "Estimated variance    : " << var  << " (theory: " << theo_var  << ")\n";
45     cout << "Estimated P(|X|<=1)  : " << p_interval << " (theory: " << theo_p_interval << ")\n";
46 
47     return 0;
48 }
49

This program samples N values from a standard normal distribution using C++ <random>, then estimates the mean, variance, and the probability that |X| ≤ 1. It compares Monte Carlo estimates to known theoretical values, illustrating E[X], Var(X), and probabilities computed as expectations of indicator functions. By the law of large numbers, the estimates approach the truth as N grows.

Time: O(N)Space: O(1)

Bayes’ theorem: exact computation and Monte Carlo validation for a diagnostic test

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute posterior P(Disease | TestPositive) given prior, sensitivity, and specificity
5 // prior = P(D), sens = P(T+ | D), spec = P(T- | ~D)
6 double bayes_posterior(double prior, double sens, double spec) {
7     double pD = prior;
8     double pNotD = 1.0 - pD;
9     double pTpos_given_D = sens;
10     double pTpos_given_notD = 1.0 - spec; // false positive rate
11     double pTpos = pTpos_given_D * pD + pTpos_given_notD * pNotD; // law of total probability
12     return (pTpos_given_D * pD) / pTpos;
13 }
14 
15 int main() {
16     ios::sync_with_stdio(false);
17     cin.tie(nullptr);
18 
19     double prior = 0.01;     // 1% prevalence
20     double sens  = 0.95;     // sensitivity: 95%
21     double spec  = 0.90;     // specificity: 90%
22 
23     double exact = bayes_posterior(prior, sens, spec);
24 
25     // Monte Carlo validation
26     random_device rd; mt19937 gen(rd());
27     bernoulli_distribution disease(prior);
28     bernoulli_distribution tp(sens);          // true positive when disease present
29     bernoulli_distribution tn(spec);          // true negative when disease absent
30 
31     const size_t N = 500000; // trials
32     size_t test_pos = 0;
33     size_t dise_and_pos = 0;
34 
35     for (size_t i = 0; i < N; ++i) {
36         bool D = disease(gen);
37         bool Tpos;
38         if (D) {
39             Tpos = tp(gen);
40         } else {
41             // if no disease, test is positive with prob (1 - specificity)
42             bernoulli_distribution fp(1.0 - spec);
43             Tpos = fp(gen);
44         }
45         if (Tpos) {
46             ++test_pos;
47             if (D) ++dise_and_pos;
48         }
49     }
50 
51     double mc = (test_pos > 0) ? (double)dise_and_pos / test_pos : 0.0;
52 
53     cout.setf(std::ios::fixed); cout << setprecision(6);
54     cout << "Exact P(D|T+): " << exact << "\n";
55     cout << "MC   P(D|T+): " << mc    << " (N=" << N << ")\n";
56 
57     return 0;
58 }
59

We compute the posterior probability of disease after a positive test using Bayes’ theorem from prior prevalence, sensitivity, and specificity. We then simulate many patients: draw disease status, generate test results based on sensitivity/specificity, and estimate P(D|T+) as dise_and_pos/test_pos. The Monte Carlo estimate converges to the exact posterior by the law of large numbers.

Time: O(N)Space: O(1)

CLT in action: histogram of standardized means of Exponential(1) samples

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 int main() {
5     ios::sync_with_stdio(false);
6     cin.tie(nullptr);
7 
8     random_device rd; mt19937 gen(rd());
9     exponential_distribution<double> exp1(1.0); // mean=1, var=1
10 
11     const int n = 50;       // samples per trial (increase to see stronger CLT)
12     const int trials = 10000; // number of trials
13 
14     // We standardize: Z = sqrt(n) * (Xbar - mu) / sigma, with mu=1, sigma=1 for Exp(1)
15     vector<double> Z; Z.reserve(trials);
16 
17     for (int t = 0; t < trials; ++t) {
18         double s = 0.0;
19         for (int i = 0; i < n; ++i) s += exp1(gen);
20         double xbar = s / n;
21         double z = sqrt((double)n) * (xbar - 1.0) / 1.0;
22         Z.push_back(z);
23     }
24 
25     // Build histogram over [-4,4] with B bins
26     const int B = 20;
27     const double L = -4.0, R = 4.0;
28     vector<int> hist(B, 0);
29     for (double z : Z) {
30         if (z < L || z >= R) continue; // ignore out of range for display
31         int idx = (int)((z - L) / (R - L) * B);
32         if (0 <= idx && idx < B) hist[idx]++;
33     }
34 
35     // Print ASCII histogram
36     int maxc = *max_element(hist.begin(), hist.end());
37     cout << "Histogram of standardized means (approx N(0,1))\n";
38     for (int b = 0; b < B; ++b) {
39         double a = L + (R - L) * b / B;
40         double c = L + (R - L) * (b + 1) / B;
41         int bar = (maxc > 0) ? (50 * hist[b] / maxc) : 0;
42         cout << fixed << setprecision(2) << "[" << setw(5) << a << ", " << setw(5) << c << ") ";
43         cout << string(bar, '#') << "\n";
44     }
45 
46     return 0;
47 }
48

We repeatedly take averages of n iid Exponential(1) samples, standardize them, and bin the results. By the central limit theorem, the histogram of standardized means approaches the bell curve N(0,1) as n and/or the number of trials grow. This demonstrates how non-Gaussian data (exponential) produce Gaussian-looking averages.

Time: O(trials * n)Space: O(trials) for storing Z, reducible to O(B) if streaming directly into histogram

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	int main() {
5	ios::sync_with_stdio(false);
6	cin.tie(nullptr);
7
8	// Seed a high-quality Mersenne Twister engine
9	random_device rd;
10	mt19937 gen(rd());
11
12	// Standard normal distribution N(0,1)
13	normal_distribution<double> normal01(0.0, 1.0);
14
15	const size_t N = 200000; // number of samples
16
17	// Accumulators for mean and second moment (for variance)
18	long double sum = 0.0L;
19	long double sumsq = 0.0L;
20
21	// Estimate probability P(\|X\| <= 1)
22	size_t count_in = 0;
23
24	for (size_t i = 0; i < N; ++i) {
25	double x = normal01(gen);
26	sum += x;
27	sumsq += x * x;
28	if (fabs(x) <= 1.0) ++count_in;
29	}
30
31	long double mean = sum / N;
32	long double var = sumsq / N - mean * mean; // Var(X) = E[X^2] - (E[X])^2
33	long double p_interval = static_cast<long double>(count_in) / N;
34
35	// Theoretical values for comparison
36	// For N(0,1): E[X]=0, Var(X)=1, and P(\|X\| <= 1) = erf(1/sqrt(2))
37	long double theo_mean = 0.0L;
38	long double theo_var = 1.0L;
39	long double theo_p_interval = erf(1.0L / sqrtl(2.0L));
40
41	cout.setf(std::ios::fixed); cout << setprecision(6);
42	cout << "Samples: " << N << "\n";
43	cout << "Estimated mean : " << mean << " (theory: " << theo_mean << ")\n";
44	cout << "Estimated variance : " << var << " (theory: " << theo_var << ")\n";
45	cout << "Estimated P(\|X\|<=1) : " << p_interval << " (theory: " << theo_p_interval << ")\n";
46
47	return 0;
48	}
49

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute posterior P(Disease \| TestPositive) given prior, sensitivity, and specificity
5	// prior = P(D), sens = P(T+ \| D), spec = P(T- \| ~D)
6	double bayes_posterior(double prior, double sens, double spec) {
7	double pD = prior;
8	double pNotD = 1.0 - pD;
9	double pTpos_given_D = sens;
10	double pTpos_given_notD = 1.0 - spec; // false positive rate
11	double pTpos = pTpos_given_D * pD + pTpos_given_notD * pNotD; // law of total probability
12	return (pTpos_given_D * pD) / pTpos;
13	}
14
15	int main() {
16	ios::sync_with_stdio(false);
17	cin.tie(nullptr);
18
19	double prior = 0.01; // 1% prevalence
20	double sens = 0.95; // sensitivity: 95%
21	double spec = 0.90; // specificity: 90%
22
23	double exact = bayes_posterior(prior, sens, spec);
24
25	// Monte Carlo validation
26	random_device rd; mt19937 gen(rd());
27	bernoulli_distribution disease(prior);
28	bernoulli_distribution tp(sens); // true positive when disease present
29	bernoulli_distribution tn(spec); // true negative when disease absent
30
31	const size_t N = 500000; // trials
32	size_t test_pos = 0;
33	size_t dise_and_pos = 0;
34
35	for (size_t i = 0; i < N; ++i) {
36	bool D = disease(gen);
37	bool Tpos;
38	if (D) {
39	Tpos = tp(gen);
40	} else {
41	// if no disease, test is positive with prob (1 - specificity)
42	bernoulli_distribution fp(1.0 - spec);
43	Tpos = fp(gen);
44	}
45	if (Tpos) {
46	++test_pos;
47	if (D) ++dise_and_pos;
48	}
49	}
50
51	double mc = (test_pos > 0) ? (double)dise_and_pos / test_pos : 0.0;
52
53	cout.setf(std::ios::fixed); cout << setprecision(6);
54	cout << "Exact P(D\|T+): " << exact << "\n";
55	cout << "MC P(D\|T+): " << mc << " (N=" << N << ")\n";
56
57	return 0;
58	}
59