∑MathIntermediate

Bayes' Theorem

Key Points

•
Bayes' Theorem tells you how to update the probability of a hypothesis after seeing new evidence.
•
It combines prior belief P(A) with the likelihood of the evidence P(B|A) and normalizes by the total probability of the evidence P(B).
•
In practice you often use the law of total probability to compute P(B) when multiple hypotheses are possible.
•
Naive Bayes assumes features are conditionally independent given the class, making P(x1,...,xd|C) factor into a product that is easy to compute.
•
Numerical stability is crucial; compute in log-space and use log-sum-exp to avoid underflow with many features or tiny probabilities.
•
Base rates matter: even accurate tests can yield many false positives when the condition is rare.
•
Bayesian updating can be done iteratively as new evidence arrives, which is useful in streaming and online settings.
•
In competitive programming, careful normalization and precision handling often decide correctness on Bayesian inference tasks.

Prerequisites

→Basic probability and conditional probability — Bayes' Theorem directly relates conditional probabilities and requires comfort with P(A), P(B|A), and complements.
→Summations and combinatorics — Total probability and Beta-Binomial formulas involve sums and combinations.
→Logarithms and exponentials — Stable computation relies on log-space and the log-sum-exp identity.
→C++ vectors, maps, and strings — Implementing Naive Bayes and hypothesis enumeration uses these data structures.
→Floating-point precision — Understanding underflow/overflow and numerical stability is essential for probability computations.

Detailed Explanation

Tap terms for definitions

01Overview

Bayes' Theorem is a rule for updating beliefs in light of new evidence. Imagine you have a hypothesis A (e.g., an email is spam), and you observe evidence B (e.g., the email contains the word "prize"). Bayes' Theorem tells you how to compute the updated probability P(A|B) by blending how plausible A was before seeing B (the prior, P(A)) with how compatible B is with A (the likelihood, P(B|A)), and normalizing by how likely B is overall (the evidence, P(B)). This single equation underlies a wide range of applications: medical test interpretation, spam filtering, recommender systems, A/B testing, and probabilistic reasoning problems in programming contests. The theorem connects conditional probabilities in both directions (A given B and B given A) and clarifies why base rates matter. In computation, we often handle multiple hypotheses A_i and compute a full posterior distribution P(A_i|B) across them. Practical implementations must guard against floating-point underflow; therefore, probabilities are frequently manipulated in logarithmic form, and normalization uses the log-sum-exp trick. Bayes' Theorem is not just a formula; it is a thinking framework: start with a prior, gather evidence via a likelihood model, and produce a posterior that becomes the new prior for future updates.

02Intuition & Analogies

Think of a detective updating a suspect list as clues arrive. Initially, each suspect has a prior level of suspicion. When a new clue B (like a fingerprint) appears, it might be very likely if a certain suspect A is guilty (high P(B|A)), but perhaps less likely if someone else did it. The detective weighs the plausibility of the clue under each suspect and updates their beliefs accordingly. The total number of people who could leave that fingerprint matters too; this is the base rate, encoded in P(B). Another analogy: weather and umbrellas. Suppose you believe there’s a 10% chance of rain (prior). You look outside and see wet streets (evidence). Wet streets are more likely if it rains (high P(wet|rain)) but can also occur due to sprinklers (non-zero P(wet|no rain)). Bayes says: multiply how much the evidence favors rain by how much you initially thought it would rain, then normalize by how often you’d see wet streets regardless. In medical testing, a very accurate test for a rare disease can still produce many false alarms, because most positives come from the large pool of healthy people—this is the base-rate effect. Computationally, imagine slicing a pie of prior belief among hypotheses, then stretching each slice by how well the evidence supports it; finally, you rescale so the slices sum to 1. That rescaling is the normalization by P(B). This mental model—prior slice × evidence stretch → renormalize—captures the essence of Bayesian updating.

03Formal Definition

For events A and B with P(B) > 0, Bayes’ Theorem states: P(A|B) =

\frac{P ( B ∣ A ) P ( A )}{P ( B )}

. Here, P(A

∣ B) i s t h e p os t er i or p ro babi l i t yo f A a f t ero b ser v in g B; P (A) i s t h e p r i or p ro babi l i t yo f A; P (B ∣

A) is the likelihood of observing B if A is true; and P(B) is the evidence, computed as P(B) = P(B

∣ A) P (A) + P (B ∣\neg

A)P(

\neg

A) in the binary case, or more generally by the law of total probability over a partition \{

A_{i}

\}: P(B) =

\sum_{i}

P(B

∣ A_{i}) P (A_{i}) . I n d i scre t e h y p o t h es i ss p a ces, t h e p os t er i or d i s t r ib u t i o n s a t i s f i es P (A_{i} ∣

B) =

\frac{P ( B ∣ A _{i} ) P ( A _{i} )}{\sum _{j} P ( B ∣ A _{j} ) P ( A _{j} )}

. In continuous settings with densities, the theorem becomes

f_{A ∣ B}

(a|b) =

\frac{f _{B ∣ A} ( b ∣ a ) f _{A} ( a )}{f _{B} ( b )}

, with integrals replacing sums. A useful odds form is

\frac{P ( A ∣ B )}{P ( \neg A ∣ B )}

\frac{P ( A )}{P ( \neg A )}

\times

\frac{P ( B ∣ A )}{P ( B ∣\neg A )}

, where the second factor is the likelihood ratio (Bayes factor). Under the Naive Bayes assumption of conditional independence of features

X_{1}

\dots

X_{d}

given class C, the joint likelihood factors as P(

X_{1}

\dots

X_{d} ∣ C) = \prod_{k = 1}^{d} P (X_{k} ∣

C), enabling efficient computation.

04When to Use

Use Bayes’ Theorem whenever you want to update a probability based on new information and you can model how likely that information is under different hypotheses. Typical cases include: interpreting diagnostic tests (compute disease probability given a positive test), spam or sentiment classification with Naive Bayes (combine word-based evidence across many features), online learning or streaming updates (treat each observation as new evidence and update incrementally), and decision-making under uncertainty (compare posterior odds and act when they exceed a threshold). In competitive programming, turn problems framed as “given priors and conditional probabilities, find P(A|B)” into structured calculations: compute likelihoods, apply the total probability for normalization, and return the posterior with careful floating-point handling. When multiple mutually exclusive hypotheses exist (e.g., which biased die generated the sequence), Bayes provides a clean posterior over them. Bayesian conjugate priors (like Beta-Bernoulli) are ideal for coin-flip style problems, providing closed-form updates and predictions. Prefer Bayes when prior knowledge meaningfully shapes decisions, evidence arrives sequentially, or you need calibrated probabilities rather than hard classifications.

⚠️Common Mistakes

Common pitfalls include: (1) Ignoring base rates: people often conflate P(A|B) with P(B|A), overestimating the posterior when the condition is rare. Always compute or approximate P(B). (2) Double counting evidence: using the same observation twice in a product inflates confidence; ensure features are distinct, or account for dependence. (3) Violating Naive Bayes independence without caution: correlated features can overweight evidence. Mitigate by feature selection, decorrelation, or using models that capture dependence. (4) Numerical underflow: multiplying many small probabilities can collapse to zero. Work in log-space and use log-sum-exp for normalization. (5) Zero probabilities: unseen features give zero likelihood, killing a hypothesis. Use Laplace/additive smoothing. (6) Misinterpreting posterior odds as accuracy: a high P(A|B) doesn’t mean your classifier is accurate overall; evaluate with proper metrics and consider priors. (7) Overconfident priors: very strong priors can swamp data; test sensitivity to priors. (8) Rounding early: keep double precision and normalize at the end to preserve accuracy. (9) Forgetting to normalize across all hypotheses in multi-hypothesis problems; always divide by the total evidence sum. (10) Mixing counts and probabilities inconsistently; ensure you convert counts to probabilities correctly, and keep track of whether values are in probability or log-probability space.

Key Formulas

Bayes' Theorem

P (A ∣ B) = \frac{P ( B ∣ A ) P ( A )}{P ( B )}

Explanation: Posterior equals prior times likelihood divided by evidence. Use this to update a hypothesis after observing evidence.

Law of Total Probability

P (B) = i = 1 \sum n P (B ∣ A_{i}) P (A_{i})

Explanation: The probability of the evidence is the sum over all mutually exclusive hypotheses of likelihood times prior. This normalizes the posterior.

Discrete Posterior

P (A_{i} ∣ B) = \frac{P ( B ∣ A _{i} ) P ( A _{i} )}{\sum _{j = 1}^{n} P ( B ∣ A _{j} ) P ( A _{j} )}

Explanation: For multiple hypotheses, compute the unnormalized weight of each and divide by the total to get a proper distribution.

Odds-Likelihood Form

\frac{P ( A ∣ B )}{P ( \neg A ∣ B )} = \frac{P ( A )}{P ( \neg A )} \times \frac{P ( B ∣ A )}{P ( B ∣ \neg A )}

Explanation: Posterior odds equal prior odds times the likelihood ratio (Bayes factor). Useful for sequential updates and threshold decisions.

Naive Bayes Posterior (proportional)

P (C ∣ x_{1}, \dots, x_{d}) \propto P (C) k = 1 \prod d P (x_{k} ∣ C)

Explanation: Assuming conditional independence, the joint likelihood factors into a product. Normalize over classes to get final probabilities.

Log-Sum-Exp Trick

lo g i = 1 \sum n e^{x_{i}} = m + lo g i = 1 \sum n e^{x_{i} - m}, m = i max x_{i}

Explanation: Recenter exponents by subtracting the maximum to avoid overflow/underflow, then add it back after summation in log-space.

Two-Hypothesis Update

P (H_{1} ∣ B) = \frac{K \cdot P ( H _{1} )}{K \cdot P ( H _{1} ) + P ( H _{0} )} where K = \frac{P ( B ∣ H _{1} )}{P ( B ∣ H _{0} )}

Explanation: When comparing two hypotheses, the Bayes factor K scales the prior odds to get posterior odds, which convert to a probability.

Beta-Bernoulli Update

α^{'} = α + s, β^{'} = β + f

Explanation: With a Beta( $α, β)$ prior and s successes, f failures, the posterior is Beta( $α^{'}, β^{'}) .$ This gives closed-form sequential updates.

Posterior Predictive (One-Step)

P (next success) = \frac{α ^{'}}{α ^{'} + β ^{'}}

Explanation: Given a Beta posterior, the probability the next Bernoulli trial is a success is the posterior mean.

Beta-Binomial Predictive

P (s successes in n) = (s n) \frac{B ( α + s , β + n - s )}{B ( α , β )}

Explanation: The marginal likelihood of observing s successes in n trials under a Beta prior. Useful for evidence and Bayes factor calculations.

Complexity Analysis

Computing a single binary Bayes update is O(1) time and O(1) space, since it involves a few arithmetic operations. For multiple hypotheses {

A_{i}

}, a straightforward posterior computation requires evaluating each unnormalized weight

w_{i} = P (B ∣ A_{i}

)P(

A_{i}

) and normalizing: this costs O(m) time and O(m) space for m hypotheses. In practice, when evidence is composed of many independent features (Naive Bayes), evaluating P(B|

A_{i}

) becomes a product over d features, leading to O(m·d) time per update. To prevent numerical underflow from multiplying small probabilities, implementations switch to log-space: summations of log-likelihoods stay stable and maintain the same asymptotic cost. Normalization via log-sum-exp also runs in O(m). For text classification with Naive Bayes, training requires building counts for a vocabulary of size V across C classes from N tokens, which is O(N) time and O(V·C) space; prediction for a document with L tokens is O(L·C) time and O(1) additional space beyond the model. In conjugate-prior models like Beta-Bernoulli, updates are O(1) per observation (just increment α or

β)

and O(1) space to store the parameters; computing marginal likelihoods using log-gamma functions is O(1) as well. In competitive programming constraints, m and d are often up to 1e5; using logs, careful memory layout (vectors), and avoiding maps in tight loops can meet time limits. The main performance pitfalls are hash-map overhead in tokenization, repeated normalization inside loops (normalize once per evidence set), and excessive precision conversions.

Code Examples

Medical test interpretation with Bayes' Theorem (binary case)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Computes posterior probabilities for disease given a positive/negative test.
5 // Demonstrates Bayes' Theorem and the law of total probability.
6 int main() {
7     ios::sync_with_stdio(false);
8     cin.tie(nullptr);
9 
10     // Example parameters
11     // Prevalence (prior): P(Disease)
12     double P_A = 0.01;              // 1% prevalence
13     // Sensitivity: P(Test+ | Disease)
14     double P_B_given_A = 0.95;      // 95%
15     // Specificity: P(Test- | No Disease) = 0.98
16     double specificity = 0.98;
17     // Therefore, P(Test+ | No Disease) = 1 - specificity
18     double P_B_given_notA = 1.0 - specificity; // 0.02
19 
20     // Evidence (law of total probability): P(Test+)
21     double P_B = P_B_given_A * P_A + P_B_given_notA * (1.0 - P_A);
22 
23     // Posterior: P(Disease | Test+)
24     double P_A_given_B = (P_B_given_A * P_A) / P_B;
25 
26     // For completeness, compute P(Disease | Test-)
27     double P_notB_given_A = 1.0 - P_B_given_A;          // P(Test- | Disease)
28     double P_notB_given_notA = specificity;             // P(Test- | No Disease)
29     double P_notB = P_notB_given_A * P_A + P_notB_given_notA * (1.0 - P_A);
30     double P_A_given_notB = (P_notB_given_A * P_A) / P_notB;
31 
32     cout.setf(std::ios::fixed); cout << setprecision(6);
33     cout << "P(Test+) = " << P_B << "\n";
34     cout << "P(Disease | Test+) = " << P_A_given_B << "\n";
35     cout << "P(Disease | Test-) = " << P_A_given_notB << "\n";
36 
37     // Odds form demonstration for Test+
38     double prior_odds = P_A / (1.0 - P_A);
39     double likelihood_ratio = P_B_given_A / P_B_given_notA;
40     double posterior_odds = prior_odds * likelihood_ratio;
41     double posterior_prob_via_odds = posterior_odds / (1.0 + posterior_odds);
42     cout << "Posterior via odds (sanity check) = " << posterior_prob_via_odds << "\n";
43 
44     return 0;
45 }
46

Computes the posterior probability of disease given a positive (and negative) test using Bayes' Theorem. It shows how to get P(B) using the law of total probability and cross-checks the posterior using the odds-likelihood form.

Time: O(1)Space: O(1)

Naive Bayes text classifier with Laplace smoothing (toy spam filter)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Simple tokenizer: splits on non-letters and lowercases.
5 vector<string> tokenize(const string &s) {
6     vector<string> tokens;
7     string w;
8     for (char ch : s) {
9         if (isalpha(static_cast<unsigned char>(ch))) {
10             w.push_back(static_cast<char>(tolower(static_cast<unsigned char>(ch))));
11         } else if (!w.empty()) {
12             tokens.push_back(w);
13             w.clear();
14         }
15     }
16     if (!w.empty()) tokens.push_back(w);
17     return tokens;
18 }
19 
20 int main() {
21     ios::sync_with_stdio(false);
22     cin.tie(nullptr);
23 
24     // Toy training data: pair<label, message>
25     vector<pair<string,string>> train = {
26         {"spam", "Win a PRIZE now! Click here to claim"},
27         {"spam", "Limited offer! Winner gets cash prize"},
28         {"ham",  "Lunch at noon? Meet at the cafe"},
29         {"ham",  "Project meeting moved to next week"},
30         {"spam", "Claim your reward, limited time only"},
31         {"ham",  "Don't forget to bring the report"}
32     };
33 
34     // Count words per class and total words per class
35     unordered_map<string, int> classDocCount;  // number of documents per class
36     unordered_map<string, long long> classTokenCount; // total tokens per class
37     unordered_map<string, unordered_map<string, long long>> wordCount; // class -> word -> count
38     unordered_set<string> vocabulary;
39 
40     for (auto &ex : train) {
41         const string &label = ex.first;
42         const string &msg = ex.second;
43         classDocCount[label]++;
44         vector<string> toks = tokenize(msg);
45         classTokenCount[label] += (long long)toks.size();
46         for (auto &w : toks) {
47             wordCount[label][w]++;
48             vocabulary.insert(w);
49         }
50     }
51 
52     // Compute class priors
53     int totalDocs = (int)train.size();
54     vector<string> classes;
55     for (auto &kv : classDocCount) classes.push_back(kv.first);
56 
57     // Laplace smoothing parameter
58     const double alpha = 1.0; // add-one smoothing
59     const int V = (int)vocabulary.size();
60 
61     auto logP_class = [&](const string &cls) {
62         return log((double)classDocCount[cls]) - log((double)totalDocs);
63     };
64 
65     auto logP_word_given_class = [&](const string &w, const string &cls) {
66         long long cnt_wc = 0;
67         auto it = wordCount[cls].find(w);
68         if (it != wordCount[cls].end()) cnt_wc = it->second;
69         double num = cnt_wc + alpha;
70         double den = (double)classTokenCount[cls] + alpha * V;
71         return log(num) / 1.0 - log(den); // same as log(num/den)
72     };
73 
74     auto classify = [&](const string &msg) {
75         vector<string> toks = tokenize(msg);
76         // Count tokens (multinomial NB)
77         unordered_map<string,int> tf;
78         for (auto &w : toks) tf[w]++;
79 
80         double bestLog = -1e300;
81         string best = "";
82         for (auto &cls : classes) {
83             double lp = logP_class(cls);
84             for (auto &kv : tf) {
85                 const string &w = kv.first;
86                 int c = kv.second;
87                 lp += c * logP_word_given_class(w, cls);
88             }
89             if (lp > bestLog) {
90                 bestLog = lp;
91                 best = cls;
92             }
93         }
94         return best;
95     };
96 
97     vector<string> tests = {
98         "You are a winner! Claim your cash prize now",
99         "Are we still on for lunch tomorrow?",
100         "The report deadline is extended",
101         "Click to win limited time reward"
102     };
103 
104     for (auto &msg : tests) {
105         cout << "Message: '" << msg << "' => predicted: " << classify(msg) << "\n";
106     }
107 
108     return 0;
109 }
110

Implements a small Naive Bayes text classifier. It learns word counts per class with Laplace smoothing, computes class priors, and scores messages in log-space to avoid underflow. It predicts the class with the highest posterior.

Time: Training: O(N) over total tokens; Prediction per message: O(L · C)Space: O(V · C) for word counts

Multi-hypothesis posterior with log-sum-exp normalization

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute log-sum-exp of a vector of log-values for numerical stability.
5 double logsumexp(const vector<double> &x) {
6     double m = -numeric_limits<double>::infinity();
7     for (double v : x) m = max(m, v);
8     double s = 0.0;
9     for (double v : x) s += exp(v - m);
10     return m + log(s);
11 }
12 
13 int main() {
14     ios::sync_with_stdio(false);
15     cin.tie(nullptr);
16 
17     // Hypotheses: different dice models explaining observed rolls
18     // H0: fair die (1/6 each)
19     // H1: loaded die favoring 6 (p6=0.4, others equal among remaining 0.6)
20     // H2: loaded die favoring 1 (p1=0.35, others equal among remaining 0.65)
21 
22     vector<vector<double>> models = {
23         {1/6.0, 1/6.0, 1/6.0, 1/6.0, 1/6.0, 1/6.0},
24         {0.12, 0.12, 0.12, 0.12, 0.12, 0.40},
25         {0.35, 0.13, 0.13, 0.13, 0.13, 0.13}
26     };
27     vector<double> prior = {0.6, 0.2, 0.2}; // priors over hypotheses H0, H1, H2
28 
29     // Observed evidence: sequence of die rolls (1..6)
30     vector<int> rolls = {6,6,2,6,5,6,1,6,6,3,6};
31 
32     int m = (int)models.size();
33     vector<double> log_prior(m), log_lik(m), log_post(m);
34 
35     for (int i = 0; i < m; ++i) log_prior[i] = log(prior[i]);
36 
37     // Compute log-likelihood for each hypothesis: sum of log p(roll|Hi)
38     for (int i = 0; i < m; ++i) {
39         double s = 0.0;
40         for (int r : rolls) {
41             double p = models[i][r-1];
42             s += log(p);
43         }
44         log_lik[i] = s;
45     }
46 
47     // Unnormalized log-posterior = log prior + log likelihood
48     vector<double> log_unnorm(m);
49     for (int i = 0; i < m; ++i) log_unnorm[i] = log_prior[i] + log_lik[i];
50 
51     // Normalize using log-sum-exp
52     double lse = logsumexp(log_unnorm);
53     for (int i = 0; i < m; ++i) log_post[i] = log_unnorm[i] - lse;
54 
55     cout.setf(std::ios::fixed); cout << setprecision(6);
56     for (int i = 0; i < m; ++i) {
57         cout << "Posterior P(H" << i << "|data) = " << exp(log_post[i]) << "\n";
58     }
59 
60     return 0;
61 }
62

Computes posteriors over multiple hypotheses (three dice models) using log-probabilities and log-sum-exp for stable normalization. This pattern generalizes to many Bayesian inference tasks in contests.

Time: O(m · n) where m is hypotheses and n is number of rollsSpace: O(m)

Beta-Bernoulli conjugate update and marginal likelihood

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Log of Beta function via log-gamma: log B(a,b) = lgamma(a) + lgamma(b) - lgamma(a+b)
5 double logBeta(double a, double b) {
6     return lgamma(a) + lgamma(b) - lgamma(a + b);
7 }
8 
9 int main() {
10     ios::sync_with_stdio(false);
11     cin.tie(nullptr);
12 
13     // Prior: Beta(alpha, beta) over Bernoulli success probability p
14     double alpha = 2.0, beta = 2.0; // mildly informative prior centered at 0.5
15 
16     // Observations: s successes, f failures
17     vector<int> data = {1,0,1,1,0,1,1,1,0,1}; // 1=success, 0=failure
18     int s = 0, f = 0;
19     for (int x : data) (x ? s : f)++;
20     int n = s + f;
21 
22     // Posterior parameters
23     double alpha_post = alpha + s;
24     double beta_post  = beta + f;
25 
26     // Posterior mean and MAP (if alpha_post,beta_post > 1)
27     double posterior_mean = alpha_post / (alpha_post + beta_post);
28     double posterior_map = (alpha_post > 1.0 && beta_post > 1.0) ?
29                            (alpha_post - 1.0) / (alpha_post + beta_post - 2.0) :
30                            numeric_limits<double>::quiet_NaN();
31 
32     // Marginal likelihood of observing s successes in n trials under the Beta prior (Beta-Binomial)
33     // P(data) = C(n,s) * B(alpha+s, beta+f) / B(alpha, beta)
34     auto logChoose = [&](int nn, int kk){ return lgamma(nn + 1.0) - lgamma(kk + 1.0) - lgamma(nn - kk + 1.0); };
35     double log_marginal = logChoose(n, s) + logBeta(alpha + s, beta + f) - logBeta(alpha, beta);
36 
37     // Posterior predictive probability that next trial is a success
38     double p_next_success = alpha_post / (alpha_post + beta_post);
39 
40     cout.setf(std::ios::fixed); cout << setprecision(6);
41     cout << "s = " << s << ", f = " << f << "\n";
42     cout << "Posterior Beta(alpha', beta') = Beta(" << alpha_post << ", " << beta_post << ")\n";
43     cout << "Posterior mean p = " << posterior_mean << "\n";
44     cout << "Posterior MAP p = " << posterior_map << "\n";
45     cout << "log P(data) (marginal likelihood) = " << log_marginal << "\n";
46     cout << "P(next success) = " << p_next_success << "\n";
47 
48     return 0;
49 }
50

Shows closed-form Bayesian updating for Bernoulli data with a Beta prior. It computes the posterior parameters, posterior mean/MAP, the marginal likelihood via the Beta-Binomial formula, and the posterior predictive probability of the next success.

Time: O(n) to scan observations; O(1) for updates and closed-form computationsSpace: O(1)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Computes posterior probabilities for disease given a positive/negative test.
5	// Demonstrates Bayes' Theorem and the law of total probability.
6	int main() {
7	ios::sync_with_stdio(false);
8	cin.tie(nullptr);
9
10	// Example parameters
11	// Prevalence (prior): P(Disease)
12	double P_A = 0.01; // 1% prevalence
13	// Sensitivity: P(Test+ \| Disease)
14	double P_B_given_A = 0.95; // 95%
15	// Specificity: P(Test- \| No Disease) = 0.98
16	double specificity = 0.98;
17	// Therefore, P(Test+ \| No Disease) = 1 - specificity
18	double P_B_given_notA = 1.0 - specificity; // 0.02
19
20	// Evidence (law of total probability): P(Test+)
21	double P_B = P_B_given_A * P_A + P_B_given_notA * (1.0 - P_A);
22
23	// Posterior: P(Disease \| Test+)
24	double P_A_given_B = (P_B_given_A * P_A) / P_B;
25
26	// For completeness, compute P(Disease \| Test-)
27	double P_notB_given_A = 1.0 - P_B_given_A; // P(Test- \| Disease)
28	double P_notB_given_notA = specificity; // P(Test- \| No Disease)
29	double P_notB = P_notB_given_A * P_A + P_notB_given_notA * (1.0 - P_A);
30	double P_A_given_notB = (P_notB_given_A * P_A) / P_notB;
31
32	cout.setf(std::ios::fixed); cout << setprecision(6);
33	cout << "P(Test+) = " << P_B << "\n";
34	cout << "P(Disease \| Test+) = " << P_A_given_B << "\n";
35	cout << "P(Disease \| Test-) = " << P_A_given_notB << "\n";
36
37	// Odds form demonstration for Test+
38	double prior_odds = P_A / (1.0 - P_A);
39	double likelihood_ratio = P_B_given_A / P_B_given_notA;
40	double posterior_odds = prior_odds * likelihood_ratio;
41	double posterior_prob_via_odds = posterior_odds / (1.0 + posterior_odds);
42	cout << "Posterior via odds (sanity check) = " << posterior_prob_via_odds << "\n";
43
44	return 0;
45	}
46

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Simple tokenizer: splits on non-letters and lowercases.
5	vector<string> tokenize(const string &s) {
6	vector<string> tokens;
7	string w;
8	for (char ch : s) {
9	if (isalpha(static_cast<unsigned char>(ch))) {
10	w.push_back(static_cast<char>(tolower(static_cast<unsigned char>(ch))));
11	} else if (!w.empty()) {
12	tokens.push_back(w);
13	w.clear();
14	}
15	}
16	if (!w.empty()) tokens.push_back(w);
17	return tokens;
18	}
19
20	int main() {
21	ios::sync_with_stdio(false);
22	cin.tie(nullptr);
23
24	// Toy training data: pair<label, message>
25	vector<pair<string,string>> train = {
26	{"spam", "Win a PRIZE now! Click here to claim"},
27	{"spam", "Limited offer! Winner gets cash prize"},
28	{"ham", "Lunch at noon? Meet at the cafe"},
29	{"ham", "Project meeting moved to next week"},
30	{"spam", "Claim your reward, limited time only"},
31	{"ham", "Don't forget to bring the report"}
32	};
33
34	// Count words per class and total words per class
35	unordered_map<string, int> classDocCount; // number of documents per class
36	unordered_map<string, long long> classTokenCount; // total tokens per class
37	unordered_map<string, unordered_map<string, long long>> wordCount; // class -> word -> count
38	unordered_set<string> vocabulary;
39
40	for (auto &ex : train) {
41	const string &label = ex.first;
42	const string &msg = ex.second;
43	classDocCount[label]++;
44	vector<string> toks = tokenize(msg);
45	classTokenCount[label] += (long long)toks.size();
46	for (auto &w : toks) {
47	wordCount[label][w]++;
48	vocabulary.insert(w);
49	}
50	}
51
52	// Compute class priors
53	int totalDocs = (int)train.size();
54	vector<string> classes;
55	for (auto &kv : classDocCount) classes.push_back(kv.first);
56
57	// Laplace smoothing parameter
58	const double alpha = 1.0; // add-one smoothing
59	const int V = (int)vocabulary.size();
60
61	auto logP_class = [&](const string &cls) {
62	return log((double)classDocCount[cls]) - log((double)totalDocs);
63	};
64
65	auto logP_word_given_class = [&](const string &w, const string &cls) {
66	long long cnt_wc = 0;
67	auto it = wordCount[cls].find(w);
68	if (it != wordCount[cls].end()) cnt_wc = it->second;
69	double num = cnt_wc + alpha;
70	double den = (double)classTokenCount[cls] + alpha * V;
71	return log(num) / 1.0 - log(den); // same as log(num/den)
72	};
73
74	auto classify = [&](const string &msg) {
75	vector<string> toks = tokenize(msg);
76	// Count tokens (multinomial NB)
77	unordered_map<string,int> tf;
78	for (auto &w : toks) tf[w]++;
79
80	double bestLog = -1e300;
81	string best = "";
82	for (auto &cls : classes) {
83	double lp = logP_class(cls);
84	for (auto &kv : tf) {
85	const string &w = kv.first;
86	int c = kv.second;
87	lp += c * logP_word_given_class(w, cls);
88	}
89	if (lp > bestLog) {
90	bestLog = lp;
91	best = cls;
92	}
93	}
94	return best;
95	};
96
97	vector<string> tests = {
98	"You are a winner! Claim your cash prize now",
99	"Are we still on for lunch tomorrow?",
100	"The report deadline is extended",
101	"Click to win limited time reward"
102	};
103
104	for (auto &msg : tests) {
105	cout << "Message: '" << msg << "' => predicted: " << classify(msg) << "\n";
106	}
107
108	return 0;
109	}
110

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute log-sum-exp of a vector of log-values for numerical stability.
5	double logsumexp(const vector<double> &x) {
6	double m = -numeric_limits<double>::infinity();
7	for (double v : x) m = max(m, v);
8	double s = 0.0;
9	for (double v : x) s += exp(v - m);
10	return m + log(s);
11	}
12
13	int main() {
14	ios::sync_with_stdio(false);
15	cin.tie(nullptr);
16
17	// Hypotheses: different dice models explaining observed rolls
18	// H0: fair die (1/6 each)
19	// H1: loaded die favoring 6 (p6=0.4, others equal among remaining 0.6)
20	// H2: loaded die favoring 1 (p1=0.35, others equal among remaining 0.65)
21
22	vector<vector<double>> models = {
23	{1/6.0, 1/6.0, 1/6.0, 1/6.0, 1/6.0, 1/6.0},
24	{0.12, 0.12, 0.12, 0.12, 0.12, 0.40},
25	{0.35, 0.13, 0.13, 0.13, 0.13, 0.13}
26	};
27	vector<double> prior = {0.6, 0.2, 0.2}; // priors over hypotheses H0, H1, H2
28
29	// Observed evidence: sequence of die rolls (1..6)
30	vector<int> rolls = {6,6,2,6,5,6,1,6,6,3,6};
31
32	int m = (int)models.size();
33	vector<double> log_prior(m), log_lik(m), log_post(m);
34
35	for (int i = 0; i < m; ++i) log_prior[i] = log(prior[i]);
36
37	// Compute log-likelihood for each hypothesis: sum of log p(roll\|Hi)
38	for (int i = 0; i < m; ++i) {
39	double s = 0.0;
40	for (int r : rolls) {
41	double p = models[i][r-1];
42	s += log(p);
43	}
44	log_lik[i] = s;
45	}
46
47	// Unnormalized log-posterior = log prior + log likelihood
48	vector<double> log_unnorm(m);
49	for (int i = 0; i < m; ++i) log_unnorm[i] = log_prior[i] + log_lik[i];
50
51	// Normalize using log-sum-exp
52	double lse = logsumexp(log_unnorm);
53	for (int i = 0; i < m; ++i) log_post[i] = log_unnorm[i] - lse;
54
55	cout.setf(std::ios::fixed); cout << setprecision(6);
56	for (int i = 0; i < m; ++i) {
57	cout << "Posterior P(H" << i << "\|data) = " << exp(log_post[i]) << "\n";
58	}
59
60	return 0;
61	}
62

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Log of Beta function via log-gamma: log B(a,b) = lgamma(a) + lgamma(b) - lgamma(a+b)
5	double logBeta(double a, double b) {
6	return lgamma(a) + lgamma(b) - lgamma(a + b);
7	}
8
9	int main() {
10	ios::sync_with_stdio(false);
11	cin.tie(nullptr);
12
13	// Prior: Beta(alpha, beta) over Bernoulli success probability p
14	double alpha = 2.0, beta = 2.0; // mildly informative prior centered at 0.5
15
16	// Observations: s successes, f failures
17	vector<int> data = {1,0,1,1,0,1,1,1,0,1}; // 1=success, 0=failure
18	int s = 0, f = 0;
19	for (int x : data) (x ? s : f)++;
20	int n = s + f;
21
22	// Posterior parameters
23	double alpha_post = alpha + s;
24	double beta_post = beta + f;
25
26	// Posterior mean and MAP (if alpha_post,beta_post > 1)
27	double posterior_mean = alpha_post / (alpha_post + beta_post);
28	double posterior_map = (alpha_post > 1.0 && beta_post > 1.0) ?
29	(alpha_post - 1.0) / (alpha_post + beta_post - 2.0) :
30	numeric_limits<double>::quiet_NaN();
31
32	// Marginal likelihood of observing s successes in n trials under the Beta prior (Beta-Binomial)
33	// P(data) = C(n,s) * B(alpha+s, beta+f) / B(alpha, beta)
34	auto logChoose = [&](int nn, int kk){ return lgamma(nn + 1.0) - lgamma(kk + 1.0) - lgamma(nn - kk + 1.0); };
35	double log_marginal = logChoose(n, s) + logBeta(alpha + s, beta + f) - logBeta(alpha, beta);
36
37	// Posterior predictive probability that next trial is a success
38	double p_next_success = alpha_post / (alpha_post + beta_post);
39
40	cout.setf(std::ios::fixed); cout << setprecision(6);
41	cout << "s = " << s << ", f = " << f << "\n";
42	cout << "Posterior Beta(alpha', beta') = Beta(" << alpha_post << ", " << beta_post << ")\n";
43	cout << "Posterior mean p = " << posterior_mean << "\n";
44	cout << "Posterior MAP p = " << posterior_map << "\n";
45	cout << "log P(data) (marginal likelihood) = " << log_marginal << "\n";
46	cout << "P(next success) = " << p_next_success << "\n";
47
48	return 0;
49	}
50