Plan dataset labeling with time, cost, and staffing estimates
Dataset annotation is often the largest controllable cost in supervised ML. Whether you’re labeling images, text, audio, or video, you can usually predict the core labor by breaking the work into (1) how many annotation actions must be performed, (2) how long each action takes, and (3) how much each working hour costs. This calculator turns those inputs into estimated total labor hours, budget, and the number of annotators needed to hit a deadline.
Use it early—before you collect quotes or hire a labeling team—to sanity-check vendor bids and to compare scenarios such as single-pass labeling vs. multi-annotator consensus, or low vs. high quality-review overhead.
Inputs explained (what each field means)
Number of items
Items are the units to be labeled: an image, a text snippet, a 10‑second audio clip, a video frame, etc. If you have 10,000 images, your item count is 10,000.
Annotations per item
This captures how many separate labeling passes you want per item. Common reasons to set this above 1 include:
- Consensus labeling (e.g., 3 annotators label each item and you use majority vote).
- Multi-stage workflows (e.g., one person draws a box, another validates, a third corrects).
- Multiple label types (e.g., each item requires both a classification label and entity spans—if they are separate actions in your workflow).
Time per annotation (seconds)
The average productive time to complete one annotation action. If workers spend ~30 seconds to label one item (one pass), enter 30. If tasks vary widely, use a weighted average (or model best/expected/worst scenarios by running the calculator multiple times).
Annotator wage per hour
Enter the hourly cost you want to use for budgeting. For internal teams, a more realistic number is usually a fully loaded rate (wage + payroll taxes + benefits + equipment + overhead). For vendors, use the contracted hourly or per-task rate converted into an hourly equivalent.
Project deadline (days)
The calendar time window in days. The staffing estimate assumes each annotator contributes a fixed number of hours per day (see assumptions below). Shorter deadlines raise the required workforce.
Quality review overhead (%)
Quality assurance adds extra work beyond the raw labeling time: audits, re-labeling, adjudication, reviewer feedback loops, tool latency, and rework due to unclear guidelines. Overhead is modeled as a multiplier applied to annotation time. For example, 10% overhead means the effective time is 1.10× the base time.
Formulas used
The calculator uses the following relationships.
1) Total number of annotation actions
Annotations = Items × Annotations per item
2) Effective seconds per annotation (including overhead)
Effective seconds = Base seconds × (1 + Overhead% / 100)
3) Total labor hours
Total hours = (Annotations × Effective seconds) / 3600
4) Labor cost
Cost = Total hours × Hourly wage
5) Annotators needed to meet the deadline
Annotators needed = ceil( Total hours / (Hours per day × Deadline days) )
In MathML:
How to interpret the results
- Total hours is the estimated productive labor time (labeling + modeled review overhead). It is not the same as elapsed calendar time.
- Estimated cost is labor cost only, based on your hourly rate. If you pay per task, you can still use this as a cross-check by converting per-task to an implied hourly wage.
- Annotators needed is the minimum whole number of full-time-equivalent annotators required to finish within the deadline under the assumptions (notably hours per day). If the result is 1.15, the calculator rounds up to 2 because you can’t staff 0.15 of a person.
Worked example
Suppose you need to label 10,000 items. You require 3 annotations per item for consensus, each taking ~30 seconds. You expect 10% review overhead. Wage is $15/hour. Deadline is 30 days.
- Annotations: 10,000 × 3 = 30,000
- Effective seconds: 30 × (1 + 10/100) = 33 seconds
- Total hours: (30,000 × 33) / 3600 = 275 hours
- Cost: 275 × $15 = $4,125
- Annotators needed (assuming 8 hours/day): ceil(275 / (8 × 30)) = ceil(1.1458…) = 2
Scenario comparison
Below is a simple comparison using the same 10,000 items, 30 seconds per annotation, $15/hour, 30-day deadline, and 8 hours/day. Only consensus depth and overhead change.
| Scenario |
Annotations per item |
Review overhead |
Total hours |
Estimated cost |
Annotators needed |
| Single pass, light QA |
1 |
10% |
~91.7 |
~$1,375 |
1 |
| 3-pass consensus, light QA |
3 |
10% |
275 |
$4,125 |
2 |
| 3-pass consensus, heavy QA |
3 |
30% |
~325 |
~$4,875 |
2 |
Assumptions & limitations (read before budgeting)
- Hours per day: The staffing estimate assumes an 8-hour workday per annotator. If your team works 6 hours/day of productive labeling (breaks, meetings, tool delays), the true headcount required may be higher.
- Productive time vs. paid time: “Time per annotation” is assumed to be productive labeling time. Training, calibration, guideline updates, and management time are not explicitly modeled unless you include them in overhead.
- Constant throughput: The model assumes stable speed across the project. In practice, throughput changes due to learning curves, fatigue, changing label definitions, or domain shifts.
- Rework and dispute resolution: If your workflow includes adjudication for disagreements, escalation queues, or re-labeling after QA failures, overhead may be non-linear and higher than a simple percentage.
- Task heterogeneity: If some items are much harder than others (e.g., long documents vs. short ones), averages can hide risk. Consider modeling multiple buckets (easy/medium/hard) separately and summing hours.
- Tooling and platform costs: This calculator estimates labor cost only. It does not include labeling-platform fees, compute, storage, security reviews, or vendor PM charges.
- Staffing granularity: “Annotators needed” rounds up to whole people and assumes they are fully available to your project during the period. Part-time staffing requires adjustment.
- Calendar vs. business days: Deadline is treated as days of work capacity. If you mean calendar days but only work weekdays, adjust the deadline accordingly (e.g., 30 calendar days ≈ ~22 workdays).
Practical tips
- Pilot first: Measure time per annotation on a representative sample before committing to a budget.
- Model sensitivity: Try ±20–30% on time per annotation and overhead to see how fragile your plan is.
- Separate QA roles when needed: If reviewers are paid at a different rate, model their work as additional “annotation actions” or increase wage to a blended rate.