Test Parameters

Baseline Conversion Rate (Control, %): Current conversion rate you're testing against

Minimum Detectable Effect (% Improvement): Minimum improvement to detect (e.g., 20% = 2.5% → 3%)

Confidence Level (%): Probability of detecting effect if it exists

Statistical Power (%): Probability of detecting effect (1 - Type II error)

Test Type:

Traffic Split:

Experiment Flow Mini-Game

Why this calculator is perfect

Sample size planning is about balancing signal vs. noise. This mini-game turns that tension into a tactile exercise—every visitor you route teaches how allocation, effect size, and confidence interact.

Game Concept

Signal Chase

Route “visitors” into Control vs Variant lanes. Keep traffic near the requested split, ride out conversion surges, and push Z-score past the target confidence before time expires.

Mechanics

• Click/tap canvas to tilt split toward Variant; double-tap tilts toward Control.
• Arrow keys ←/→ nudge bias ±5%. Space pauses.
• Conversions trigger bursts and scoring; stay within ±5% of the target split to earn streak multipliers.

Technical Notes

Delta-timed canvas loop at 60 FPS, pooled visitors for GC-free motion, adaptive spawn cadence from your baseline, effect size, and power settings. Local best stored via localStorage; respects prefers-reduced-motion.

Target split 50/50

Confidence goal 95%Best 0

Provide parameters, then click to play.

Finish with high signal (Z-score) and balanced traffic to max your score.

Understanding A/B Test Sample Size

Why Sample Size Matters

Many A/B tests fail to detect real improvements because sample sizes are too small. With insufficient data, you risk "false negatives" (not detecting an improvement that exists) or "false positives" (claiming an improvement that's just statistical noise). This calculator determines how many visitors you need in each variant to reliably detect a meaningful improvement with specified confidence and statistical power.

Key Concepts

Relative Effect = \frac{Variant CR - Control CR}{Control CR}

Baseline Conversion Rate: Your current control performance (e.g., 2.5% converts)

Minimum Detectable Effect: Smallest improvement worth detecting (e.g., 20% improvement = 2.5% → 3%)

Confidence (α): Risk of false positive (typically 5%)

Power (1-β): Probability of detecting effect (typically 80%)

Sample Size Requirements by Baseline & Effect

Baseline CR	10% Improvement	20% Improvement	50% Improvement
1% (E-commerce)	~39,000 per variant	~9,900 per variant	~1,600 per variant
5% (SaaS free trial)	~7,750 per variant	~1,950 per variant	~320 per variant
10% (Newsletter signup)	~3,900 per variant	~980 per variant	~160 per variant
50% (High engagement)	~156 per variant	~39 per variant	~6 per variant

Worked Example: E-Commerce Landing Page

Scenario: Your landing page converts 2% of visitors. You want to test a new CTA button. You want 95% confidence you'd detect a 25% improvement (2% → 2.5%).

Baseline: 2%
Minimum Effect: 25% improvement
Confidence: 95% (standard)
Power: 80% (standard)
Result: 3,100 per variant, 6,200 total
At 1,000 daily visitors: 6.2 days test duration

How to Reduce Sample Size

Accept larger effect sizes: If you only want to detect 50% improvements (not 10%), sample size drops dramatically
Reduce confidence level: 90% instead of 95% cuts sample size ~25%
Reduce power: 70% instead of 80% cuts sample size ~20%
Increase traffic: Higher baseline conversion rates need smaller samples
Test only what matters: Don't test trivial changes; focus on high-impact variations

Important Limitations & Assumptions

This calculator uses simplified formulas; exact sample size depends on statistical test used
Assumes normal distribution; low conversion rates (<1%) may need adjustment
Does not account for multiple testing corrections or sequential testing
Assumes constant traffic throughout test duration; actual variation may increase required time
Does not account for novelty effects that wear off after initial exposure

Understanding Statistical Power and Type II Errors

Statistical power (typically 80%) represents the probability of detecting a true effect if it exists. In A/B testing, insufficient power means you risk missing real improvements—a Type II error. If you run a test with 60% power and see no winner, there's a 40% chance you missed a real improvement. This is why 80% is the industry standard: it balances the cost of longer tests against the risk of missing improvements. Using this calculator with 80% power ensures you're capturing real improvements with high probability. However, high-value tests (those where detecting an improvement saves significant revenue) may justify 90% power, requiring larger samples.

Multiple Testing and Statistical Corrections

A common mistake in A/B testing is running many tests without correcting for multiple comparisons. If you run 10 independent tests at 95% confidence, your overall confidence drops significantly—you're likely to see at least one false positive by chance. This is why sequential testing and corrections (like Bonferroni) exist. However, this calculator assumes a single hypothesis test. If you're conducting multiple analyses (peeking at results daily, testing multiple variants, etc.), you need stricter confidence levels or corrections. This is a primary source of A/B testing failures in practice.

Common A/B Testing Pitfalls

Peeking at results: Looking at test results before reaching sample size inflates false positive rates. The test was designed for a specific sample size; looking early breaks those guarantees.
Stopping early for a winner: Even if results look positive early, you must reach the calculated sample size. "Early winners" are often statistical noise that disappears with more data.
Testing too many variants: Each additional variant requires more total traffic. Testing 5 variants simultaneously requires roughly 5× the sample size of a simple A vs B test.
Ignoring seasonal effects: Conversion rates vary by day of week, season, and holiday. A test running only on weekends may not reflect overall performance.
Neglecting external events: Press coverage, competitor launches, and market changes affect conversion rates. Control for external factors.
Low baseline conversion rates: Tests with <0.5% baseline conversion rates require enormous sample sizes. Consider testing on higher-traffic pages or accepting larger minimum detectable effects.

Real-World A/B Testing Scenarios

SaaS Free Trial Sign-up: 8% baseline conversion, want to detect 15% improvement (8% → 9.2%). At 95% confidence and 80% power: ~3,850 per variant. At 1,000 daily sign-ups: ~4 days per variant = 8 days total test duration. Reasonable for feature validation.

High-Traffic E-commerce Checkout: 2% baseline conversion, want to detect 20% improvement (2% → 2.4%). With 50,000 daily visitors, you reach sample size in hours. Justifies testing small improvements on high-traffic pages.

Low-Traffic Email Campaign: 0.5% baseline click rate, want to detect 50% improvement (0.5% → 0.75%). At 95% confidence and 80% power: ~3,200 per variant. At 2,000 email subscribers per test: ~1.6 segments = 3+ weeks. Consider accepting larger effect sizes or running longer campaigns.

Moving Beyond Binary Comparisons

This calculator focuses on detecting differences between two variants. Modern A/B testing increasingly uses multivariate testing (testing multiple elements simultaneously) and continuous experimentation platforms (always-on testing infrastructure). In multivariate tests, sample size requirements grow with the number of elements tested. Continuous experimentation requires infrastructure to track test assignments and results, but reduces the time between hypothesis and decision.

Bayesian vs. Frequentist A/B Testing

This calculator uses frequentist statistics (classical hypothesis testing). Bayesian A/B testing is an alternative approach that treats effect size as a probability distribution and allows stopping decisions based on posterior probability of superiority. Bayesian methods can enable earlier stopping when results are clear or stopping when the difference is too small to matter. However, Bayesian tests still require sample size planning for proper calibration—you can't simply stop whenever results look good. Both approaches require disciplined methodology to avoid false positives.

Summary

Proper sample size planning is foundational to valid A/B testing. This calculator helps you determine the sample size needed to detect meaningful improvements with high probability. Remember: reach your calculated sample size before making decisions, avoid peeking at results, correct for multiple testing, and account for external factors. A/B testing failures are often not due to the math being wrong, but due to violations of test design assumptions—respecting those assumptions ensures your tests deliver actionable insights. Use this tool to plan your tests rigorously, then execute without deviation from the plan.

Daily Traffic Needed	0 visitors
Days Required (At 1,000 daily visitors)	0 days
Weeks Required (At 1,000 daily visitors)	0 weeks

A/B Test Sample Size Calculator

A/B Test Sample Size Results

Sample Size Required

Test Duration Estimate

Expected Outcome

Test Planning Summary

Experiment Flow Mini-Game

Why this calculator is perfect

Understanding A/B Test Sample Size

Why Sample Size Matters

Key Concepts

Sample Size Requirements by Baseline & Effect

Worked Example: E-Commerce Landing Page

How to Reduce Sample Size

Important Limitations & Assumptions

Understanding Statistical Power and Type II Errors

Multiple Testing and Statistical Corrections

Common A/B Testing Pitfalls

Real-World A/B Testing Scenarios

Moving Beyond Binary Comparisons

Bayesian vs. Frequentist A/B Testing

Summary

Embed this calculator

A/B Test Sample Size Results

Sample Size Required

Test Duration Estimate

Expected Outcome

Test Planning Summary

Experiment Flow Mini-Game

Why this calculator is perfect

Understanding A/B Test Sample Size

Why Sample Size Matters

Key Concepts

Sample Size Requirements by Baseline & Effect

Worked Example: E-Commerce Landing Page

How to Reduce Sample Size

Important Limitations & Assumptions

Understanding Statistical Power and Type II Errors

Multiple Testing and Statistical Corrections

Common A/B Testing Pitfalls

Real-World A/B Testing Scenarios

Moving Beyond Binary Comparisons

Bayesian vs. Frequentist A/B Testing

Summary

Embed this calculator

Related Calculators

Scientific Experiment Design & Sample Size Calculator

Cohen's d Effect Size Calculator - Compare Two Means

Model Evaluation Sample Size Calculator

A/B Test Significance Calculator - Evaluate Conversion Differences

Rural Homeschool Testing Compliance Schedule Calculator Checklist clipboard icon

Sample Size Calculator - Plan Surveys and A/B Tests