Prompt Caching Savings Calculator

JJ Ben-Joseph headshot JJ Ben-Joseph

Why prompt caching matters

LLM applications often see repeated or near-duplicate requests: customer support macros, recurring code scaffolds, standardized tutoring questions, and product Q&A. If every request is executed from scratch, you pay for the same prompt and completion tokens repeatedly and you wait for the model to generate the same output repeatedly. Prompt/response caching avoids that by storing a previous response (and/or intermediate model state, depending on the system) and reusing it when the same prompt is requested again.

This calculator provides a simple, transparent estimate of (1) token cost savings and (2) latency savings when repeats are served from cache instead of the model.

What the inputs mean

Formulas used

First compute the tokens processed per model execution:

T = Tp + Tc

Baseline (no caching):

With caching (each unique prompt executed once):

Savings:

Cache hit rate estimate (based purely on unique prompts vs total requests):

r = 1 − (U / N)

r = 1 U N

Interpreting the results

Worked example

Suppose:

Then T = 300 tokens. Baseline token volume: Vraw = 10,000 × 300 = 3,000,000 tokens. Baseline cost: (3,000,000/1000)×0.002 = $6.00. Cached token volume: Vcache = 2,000 × 300 = 600,000 tokens. Cached cost: (600,000/1000)×0.002 = $1.20. Estimated cost savings: $4.80.

Latency: Hraw = 3,000,000 × 5 ms = 15,000,000 ms (15,000 s). Hcache = 600,000 × 5 ms = 3,000,000 ms (3,000 s). Estimated latency savings: 12,000 s. Hit rate: r = 1 - 2000/10000 = 0.8 (80%).

Baseline vs caching comparison

Metric No caching With caching (unique prompts only)
Model executions N U
Token volume N × (Tp + Tc) U × (Tp + Tc)
Token cost (V/1000) × C1k (V/1000) × C1k
Latency (token-based estimate) V × Lt V × Lt

Assumptions and limitations

Enter workload and cost parameters to evaluate caching benefits.

Embed this calculator

Copy and paste the HTML below to add the Prompt Caching Savings Calculator to your website.