Estimate a defensible sample size before recruitment begins. This planner helps you compare study designs, pressure-test assumptions, and translate abstract power targets into something operational: participants, budget, and feasibility.
Introduction
Good experiments are designed twice: first on paper, then in the lab, clinic, field site, or product environment where the data are actually collected. The paper version matters because it forces you to answer a simple question before you spend time and money: how many observations do you need to give your study a fair chance of detecting the effect you care about? This calculator is built for that early planning stage. It does not replace a full statistical analysis plan, but it gives you a fast, transparent estimate of the sample size required for common study types and shows how that estimate changes when your assumptions become more conservative or more ambitious.
The practical reason to run a sample-size calculation is not just to satisfy a methods section. A study that is too small can fail to detect a real effect, which means you may spend months collecting data only to end with an inconclusive result. A study that is unnecessarily large can waste budget, prolong timelines, expose more participants than needed, and create operational burden for your team. Planning a realistic sample size is therefore a scientific question and a resource-management question at the same time. That is why this page includes not only the core estimate per group, but also attrition, compliance, and cost assumptions that often determine whether a design is actually feasible.
This tool is especially useful when you are comparing scenarios. You might begin with a medium expected effect and standard 80% power, then ask what happens if the true effect is smaller, if you want 90% power, or if dropout is worse than hoped. Those comparisons are often more valuable than a single number, because they reveal which assumptions are driving your recruitment target. If a modest change in effect size doubles the required sample, that tells you where your uncertainty sits. If compliance is low, you may learn that improving the protocol could save more participants than relaxing a statistical threshold.
How this calculator should be used
Start by choosing the study type that most closely matches your design. A two-sample t-test is the default for comparing two independent groups. A paired design is more efficient when the same participants are measured twice, because each person serves as part of their own control. ANOVA is appropriate when you have three or more groups. Correlation or regression is meant for studies focused on the strength of association rather than a group mean difference. Proportion comparisons are included for binary outcomes such as success or failure, conversion or no conversion, infected or not infected.
Next, choose whether your hypothesis is two-tailed or one-tailed. Two-tailed tests are standard in most confirmatory work because they allow for effects in either direction. A one-tailed test should only be used when a directional prediction is justified in advance and the opposite direction would not count as evidence for your research claim. Then set your significance level, alpha, and desired power. Alpha is the false-positive rate you are willing to tolerate. Power is the probability of detecting a true effect of the size you specify. In most applied research, alpha of 0.05 and power of 0.80 are conventional starting points, but more stringent designs often use 0.90 power or alpha of 0.01.
The effect size input deserves special care because it often drives the final answer. In plain language, effect size describes how large a difference or relationship you expect. The menu offers standard Cohen's d values, but you can enter a custom number when you have pilot data, prior literature, or a domain-specific minimum effect that would be considered meaningful. A large expected effect yields a smaller required sample because it is easier to detect. A small expected effect yields a larger required sample because the signal is harder to separate from noise. The baseline value and standard deviation fields help you think in real units, not just standardized ones. If your standard deviation is large relative to the size of the effect you care about, the study will usually need more participants.
Finally, enter the constraints that turn a statistical plan into an operational plan. Dropout rate reflects participants who do not complete the study. Compliance rate reflects the share of recruited participants who follow the protocol well enough for the intended analysis. Those two fields matter because the number you need to recruit is often larger than the number you need to analyze. If you also enter a budget and cost per participant, the results panel can compare your projected cost with your available resources. That makes this calculator useful not only for protocol writing, but also for grant planning, ethics submissions, and internal project scoping.
Formula, intuition, and assumptions
A standard way to think about sample-size planning is that required n grows when you demand stricter evidence, higher power, or sensitivity to smaller effects. In many introductory settings, the relationship is summarized with a continuous-outcome approximation like the one below. On this page, that familiar structure is used as the planning backbone and then adjusted for study type, group count, attrition, and compliance. Because the calculator is meant for rapid scenario testing, it should be interpreted as a practical estimate rather than a substitute for protocol-grade modeling of every design nuance.
The moving parts are straightforward. The z-terms represent the evidence threshold implied by alpha and the protection against false negatives implied by your power choice. SD represents expected variability in the outcome. Δ represents the difference you want to detect in real units. Bigger noise makes the denominator effectively weaker, so the study needs more observations. A bigger true difference stands out more clearly, so the study can be smaller. When you choose an effect size such as Cohen's d, you are describing that difference after standardizing by variability. The calculator then translates the planning problem into a required sample estimate and adjusts it for the selected test family.
There are also important assumptions behind any quick calculator. The estimate here is most defensible when your groups are roughly balanced, the primary outcome is clearly defined, and the design does not involve heavy clustering or complicated repeated-measures dependence. If your study has unequal allocation, multiple primary endpoints, strong baseline covariate adjustment, adaptive interim looks, hierarchical sampling, or time-to-event outcomes, a specialized power-analysis workflow is more appropriate. Even in those cases, however, a fast approximation can still be useful as a first conversation starter because it reveals the basic scale of the study before more advanced modeling begins.
Worked example
Imagine you are planning a two-group experiment comparing a new intervention with a control condition. You choose a two-tailed test, alpha of 0.05, and 80% power. Based on earlier work, you think a medium effect is plausible, so you select Cohen's d = 0.50. Suppose the expected standard deviation of your outcome is 15 units. That means a medium standardized effect corresponds to a difference of roughly 7.5 units on the original measurement scale, because 0.50 multiplied by 15 equals 7.5. If a 7.5-unit improvement would be meaningful in your field, this is a reasonable planning target.
Now add operational realism. If you expect 10% dropout and 85% compliance, the number you must recruit will be larger than the number you hope to analyze. That distinction is easy to miss in early planning, but it matters a great deal when you are booking staff time, ordering supplies, or projecting recruitment duration. You may discover that the raw statistical sample size looks manageable while the recruitment-adjusted number is substantially larger. That is precisely the kind of insight a planning calculator should surface early, before those assumptions are baked into a timeline or funding request.
The example also shows why sensitivity analysis matters. If you repeat the same calculation with a smaller expected effect, such as d = 0.30, the required sample will often increase sharply. Nothing about the intervention changed; only your expectation of how detectable it is changed. This is why experienced researchers treat effect-size assumptions cautiously. Optimistic assumptions make studies look affordable. Conservative assumptions make studies more robust. The best planning process is usually to run both and decide whether the study remains worthwhile under the less favorable scenario.
How to read the result
After you click calculate, the page reports the estimated minimum sample size per group, the total sample across groups, and the recruitment-adjusted total after dropout and compliance. The result is best understood as a planning anchor, not a guarantee. If the recommended sample size is very small, that can mean your expected effect is unusually large, not necessarily that the study is easy. If the recommended sample size is very large, that does not automatically mean the idea is bad; it may mean the outcome is noisy, the meaningful effect is subtle, or the evidence standard is appropriately strict for the stakes involved.
Pay special attention to the minimum detectable effect line in the detailed analysis area. That value tells you, roughly, what size difference your planned study can detect given the rest of your assumptions. It acts as a reality check. If your scientific or clinical question is about smaller effects than the study can realistically detect, you either need a larger sample, lower measurement variability, stronger design efficiency, or a narrower question. The recommendations box then translates those numbers into practical guidance about power, attrition, compliance, and budget. Taken together, the goal is not just to print a number, but to help you decide whether the design you are imagining is actually coherent.
Choosing stronger inputs for a better plan
The hardest part of sample-size planning is usually not the calculator itself. It is choosing inputs that are honest enough to be useful. Effect size should ideally come from meta-analyses, closely related studies, historical controls, or pilot data rather than hope. Standard deviation should come from the same outcome measure you plan to analyze, not from a vaguely similar metric that happens to be easier to find. Dropout and compliance should be based on how demanding the protocol really is. A short online survey and a six-month clinical follow-up should not share the same attrition assumption just because it would make the spreadsheet cleaner.
One practical habit is to define a minimum meaningful effect before you look at the output. Ask yourself what difference would change a scientific conclusion, a policy decision, or a product choice. Then compare that threshold with the calculator's implied detectable effect. If the study is only powered for changes larger than what would matter, the experiment may be structurally incapable of answering the real question. That is not a statistical failure at the end of the study; it is a design failure at the beginning.
It is also wise to think about design efficiency before simply increasing recruitment. Better measurement reliability, tighter eligibility criteria, reduced protocol complexity, and paired or blocked designs can sometimes lower the required sample more effectively than trimming statistical standards. In other words, sample size is not the only lever. Variability, adherence, and outcome quality are levers too. This is why experienced teams revisit the design itself when the first power estimate looks unaffordable.
As a final check, document your assumptions in plain language. Write down where the effect size came from, why the power target was chosen, what dropout scenario you planned for, and what budget constraint mattered. That record is valuable later when reviewers, collaborators, or future-you ask why the study was designed the way it was. A transparent estimate is far more useful than a mysterious number copied from an old protocol.
- Use prior evidence when possible: published estimates beat intuition.
- Stress-test the plan: try a smaller effect size and worse retention to see how fragile the design is.
- Keep the primary outcome central: the sample should be justified for the main question, not every possible secondary analysis.
- Escalate to specialist tools when needed: clustered, adaptive, longitudinal, or survival designs usually need more detailed modeling.