ΣCALCULATORWizard 140+ Calculators

P-value Calculator

Calculate p-values for z-tests, t-tests, and chi-square tests. One-tailed and two-tailed results with instant significance interpretation at any alpha level.

Common z-scores
1.645 1.96 2.326 2.576 -1.5
P-value (two-tailed)
0.0500
z = 1.96 | alpha = 0.05
0.0500
P-value
0.025
One-tail p
0.05
Alpha
95%
Confidence
Examples
t=2.0, df=10 t=2.5, df=20 t=3.0, df=30 t=-1.8, df=15
P-value (t-test)
P-value
One-tail p
Alpha
df
Critical value examples
3.841, df=1 5.991, df=2 9.488, df=4 15.0, df=5
P-value (right-tailed)
P-value
Alpha
df
Chi-sq stat

What Is a P-value and How Do You Use It?

A p-value is the probability of obtaining results at least as extreme as your observed data, assuming the null hypothesis is true. It is the cornerstone of statistical hypothesis testing used in medicine, psychology, economics, biology, and every quantitative field. A small p-value means your data would be unlikely to occur by chance if the null hypothesis were true — giving you evidence to reject it.

The p-value does not tell you the probability that your hypothesis is correct, nor the probability that the result was due to chance. This is the single most common misinterpretation in science. If p = 0.03, it means results this extreme would occur only 3% of the time under the null hypothesis — not that there is a 3% chance you are wrong. Your significance level (alpha) — set before data collection — determines your threshold. If p is less than alpha, reject the null hypothesis.

The threshold alpha = 0.05 (5%) is a convention from Ronald Fisher in the 1920s, not a universal truth. Physics requires p less than 0.0000003 (five sigma) for new particle discoveries. The FDA often requires p less than 0.025 for drug approvals (two-sided). Some psychology journals now recommend p less than 0.005 to reduce false positives. Always choose alpha before you collect data — changing it after seeing results is called p-hacking and invalidates the test.

Which Test Should You Use?

TestUse WhenRequires
Z-testLarge sample (n > 30) or known population SDNormal distribution, continuous data
T-testSmall sample (n < 30), unknown population SDApproximately normal data
Chi-squareCategorical data, independence or goodness-of-fitExpected cell counts ≥ 5

One-Tailed vs Two-Tailed Tests

A two-tailed test tests for any difference (either direction). Use this by default. A one-tailed test tests for a difference in one specific direction only — it is more powerful but must be justified by a prior directional hypothesis stated before data collection. One-tailed tests halve the p-value compared to two-tailed, which is why they're sometimes misused to push borderline results past the significance threshold.

P-valueInterpretationNotation
p < 0.001Extremely significant***
0.001 to 0.01Very significant**
0.01 to 0.05Significant*
0.05 to 0.10Marginal trend
p ≥ 0.10Not significantns
💡 Pro Tip — Statistical vs Practical Significance: A p-value tells you whether an effect exists, not whether it matters. With 50,000 subjects, a blood pressure reduction of 0.5 mmHg might produce p < 0.001 but have zero clinical relevance. Always report an effect size (Cohen's d, odds ratio, or R²) alongside your p-value. Statistical significance without practical significance is a common pitfall in large-dataset research.

Understanding the T-distribution and Chi-square Distribution

The t-distribution was developed by William Sealy Gosset in 1908 (published under the pseudonym "Student") to handle small-sample statistics. It resembles the normal distribution but with heavier tails — reflecting the extra uncertainty when estimating the population standard deviation from a small sample. As degrees of freedom (df = n - 1) increase, the t-distribution converges to the standard normal. At df = 120, the difference is negligible. For small samples (df < 30), the t-distribution produces larger critical values and therefore higher p-values than the z-test for the same test statistic — this is the correct conservative behavior.

The chi-square distribution is always right-skewed and defined only for non-negative values. It arises when you square standard normal variables: if Z is standard normal, then Z² follows chi-square with 1 degree of freedom. A chi-square test with df degrees of freedom tests whether observed categorical frequencies differ from expected. For a 2x2 contingency table (two binary variables), df = (rows - 1)(columns - 1) = 1. For a 3x4 table, df = 2 x 3 = 6. The chi-square statistic equals the sum of (Observed - Expected)² / Expected across all cells.

A common mistake is applying chi-square when expected cell counts are less than 5. With small expected counts, the chi-square approximation breaks down and you should use Fisher's exact test instead. Fisher's exact test calculates the exact probability of observing your contingency table or a more extreme one, without relying on distributional approximations.

The Multiple Comparisons Problem

If you run 20 independent hypothesis tests at alpha = 0.05, you expect one false positive on average — even when all null hypotheses are true. This is the multiple comparisons problem. Solutions include the Bonferroni correction (divide alpha by the number of tests), the Benjamini-Hochberg procedure for controlling false discovery rate, and pre-registration of hypotheses before data collection. In genomics studies testing millions of genetic variants, researchers use alpha = 5 x 10&sup-8; as the genome-wide significance threshold to account for this.

💡 Pro Tip — Report Exact P-values: Modern journals and the APA style guide require reporting the exact p-value (p = 0.032) rather than just the comparison (p < 0.05). Exact p-values allow readers to apply different alpha thresholds and enable future meta-analyses to combine your results with other studies. When p is extremely small, report it as p < 0.001 or p < 0.0001 rather than p = 0.000.

Frequently Asked Questions

What does p = 0.05 actually mean?
A p-value of 0.05 means that if the null hypothesis were true, you would observe data as extreme as yours about 5% of the time. It does not mean there is a 5% chance the result was due to chance, nor that the null hypothesis has a 5% probability of being true. The 0.05 threshold is a conventional decision rule: you accept a 5% long-run false positive rate (Type I error rate). This convention was suggested by Fisher in 1925 as a rough guide, not a scientific absolute.
What is the difference between Type I and Type II errors?
A Type I error (false positive) occurs when you reject the null hypothesis when it is actually true — concluding there is an effect when there isn't. The probability of a Type I error equals your alpha level (0.05 means a 5% chance of false positives). A Type II error (false negative) occurs when you fail to reject a false null hypothesis — missing a real effect. The probability of a Type II error is called beta, and 1 - beta is called statistical power. Most studies aim for 80% power (beta = 0.20), meaning a 20% chance of missing a real effect of the specified size.
When should I use a t-test vs a z-test?
Use a t-test when you don't know the population standard deviation and must estimate it from your sample — which is almost always in practice. Use a z-test only when the population standard deviation is known exactly (rare) or your sample is very large (n > 30) and the sampling distribution is approximately normal. As a practical rule, default to the t-test. With large samples, the t and z tests produce nearly identical results, so there is no harm in always using the t-test.
How do I interpret a chi-square p-value?
A chi-square p-value tells you the probability of observing a chi-square statistic as large as yours (or larger) if the null hypothesis were true. The chi-square test is always right-tailed because the statistic is always non-negative and larger values always indicate greater departure from the null. If your chi-square p-value is less than your alpha, you reject the null — concluding that the categorical distributions differ (goodness-of-fit test) or that the two variables are not independent (test of independence).
Can a p-value tell me that my hypothesis is correct?
No. A p-value can only tell you whether your data are inconsistent with the null hypothesis. Rejecting H&sub0; means your data are unlikely under H&sub0; — not that your alternative hypothesis H&sub1; is proven true. There could be many other explanations for your result. Statistical significance is evidence against the null, not proof of the alternative. This is why replication, pre-registration, effect sizes, and confidence intervals are all necessary for drawing sound scientific conclusions.
What is statistical power and how does it relate to p-values?
Statistical power is the probability of detecting a real effect when it exists (1 - Type II error rate). Power depends on sample size, effect size, and alpha level. Larger samples and larger true effects both increase power. A study with low power (say 40%) will frequently produce p > 0.05 even when a real effect exists — yielding a false negative. Power analysis before data collection determines the sample size needed to detect effects of a given size with desired probability. A commonly cited target is 80% power, requiring careful calculation based on the expected effect size in your specific research area.