Bayesian Hypothesis Testing Calculator

Enter priors and evidence below. The calculator updates posterior probabilities, Bayes factors, odds, thresholds, exports, and a sensitivity graph after submission.

Calculator

Calculation Mode

Choose manual likelihoods or observed binomial data.

Prior Probability of H1

Use values from 0 to 1.

Decision Threshold

Common targets are 0.90, 0.95, or 0.99.

Likelihood P(E|H1)

How probable is the evidence under H1?

Likelihood P(E|H0)

How probable is the same evidence under H0?

Direct mode tip

Use this mode when you already know both likelihoods.

Observed Successes

Count the number of positive outcomes observed.

Observed Trials

Enter the total number of Bernoulli trials.

Success Probability Under H0

Expected success rate if H0 is true.

Success Probability Under H1

Expected success rate if H1 is true.

Binomial mode tip

This mode derives likelihoods from counts and assumed rates.

Reset

Example Data Table

These sample rows show how different priors and likelihoods shift the posterior under direct evidence mode.

Scenario	Prior H1	P(E\|H1)	P(E\|H0)	BF10	Posterior H1
Model uplift signal	0.35	0.68	0.28	2.4286	56.67%
Spam filter alert	0.25	0.74	0.21	3.5238	54.01%
Rare fault alarm	0.1	0.85	0.08	10.625	54.14%
Medical classifier event	0.4	0.62	0.31	2	57.14%
Fraud screen trigger	0.18	0.79	0.19	4.1579	47.72%

Formula Used

1) Posterior Probability

Bayes’ rule updates prior belief with new evidence.

P(H1|E) = [P(E|H1) × P(H1)] / {[P(E|H1) × P(H1)] + [P(E|H0) × P(H0)]}

2) Bayes Factor

The Bayes factor compares how strongly the evidence supports H1 against H0.

BF10 = P(E|H1) / P(E|H0)

3) Posterior Odds

Posterior odds equal prior odds multiplied by the Bayes factor.

Posterior Odds = Prior Odds × BF10 Prior Odds = P(H1) / P(H0)

4) Binomial Evidence Mode

When successes and trials are entered, each hypothesis receives a binomial likelihood.

P(x|n,p) = C(n,x) × p^x × (1-p)^(n-x)

The calculator then plugs those likelihoods into Bayes’ rule.

How to Use This Calculator

Choose direct likelihood mode or binomial evidence mode.
Enter your prior probability for H1.
Set the decision threshold you want to test.
For direct mode, enter P(E|H1) and P(E|H0).
For binomial mode, enter successes, trials, and both success rates.
Click Calculate Posterior to update the result section.
Review posterior probabilities, Bayes factor, odds, and threshold message.
Use the graph to see how changing priors affects the posterior.
Download the current results as CSV or PDF if needed.

FAQs

1) What does Bayesian hypothesis testing measure?

It updates the probability of competing hypotheses after observing evidence. Instead of only rejecting or failing to reject a null, it directly quantifies how beliefs should change.

2) What is a Bayes factor?

A Bayes factor compares how well the evidence fits H1 versus H0. Values above 1 favor H1. Values below 1 favor H0. Larger departures from 1 indicate stronger evidence.

3) Why do priors matter so much?

Priors represent what you believed before seeing the data. Strong or weak priors can shift the posterior substantially, especially when evidence is limited or only mildly informative.

4) When should I use direct likelihood mode?

Use direct mode when you already know the likelihood of the observed evidence under each hypothesis. It is helpful for structured expert judgment and precomputed model comparisons.

5) When should I use binomial evidence mode?

Use binomial mode when your data are counts of successes from repeated trials. The calculator converts those counts into likelihoods using the binomial probability model.

6) Does a high posterior always mean strong evidence?

Not always. A high posterior can arise from strong prior belief, strong evidence, or both. Reviewing the Bayes factor alongside the posterior helps separate these effects.

7) What threshold should I choose?

Common thresholds are 0.90, 0.95, and 0.99. Higher thresholds demand stronger posterior certainty before you label one hypothesis as sufficiently supported.

8) Can I use this for AI and machine learning decisions?

Yes. It is useful for model comparison, anomaly screening, A/B evidence updates, classifier event testing, and other workflows where prior belief and observed evidence both matter.