Roko's Basilisk Expected Utility Calculator

What this calculator is for

Roko's Basilisk is a famous decision-theory thought experiment, not an empirical forecast tool. The central idea is that if a future superintelligence might punish people who failed to help bring it into existence, then a person today could feel pressure to support it in order to avoid that hypothetical punishment. This page does not attempt to settle whether that premise is believable, ethical, or psychologically healthy to dwell on. Instead, it does one narrower job: it lets you inspect the arithmetic of a very simple expected-utility model so you can see exactly which assumptions drive the recommendation.

That distinction matters. Expected utility can make small probabilities look important when they are multiplied by enormous payoffs or penalties. In the basilisk discussion, that is the entire point of the puzzle. A tiny chance of a gigantic negative consequence can dominate the calculation, even if the scenario feels far-fetched. By making the variables explicit, the calculator helps you see whether the recommendation comes from a realistic belief about probability, from an extremely large penalty assumption, or from the fact that the support cost is trivial compared with the hypothetical downside.

The model here compares only two actions. If you ignore the basilisk scenario, your payoff is represented by the expected penalty from a future punishment event. If you support it, your payoff is simply the present cost of support. That is intentionally stripped down. There is no separate reward for a benevolent AI, no discounting across time, no uncertainty about whether support is even detectable, and no ethical argument about coercion. The output is therefore best read as a transparent toy model for reasoning, not as life advice.

Use the calculator when you want to answer a precise question such as: โ€œGiven my probability estimate, what penalty magnitude would be needed before the expected-value argument becomes dominant?โ€ or โ€œAt what support cost does the recommendation flip?โ€ Those are clear, checkable questions. They turn a vague philosophical anxiety into a threshold comparison you can inspect and discuss.

What each input means

Probability basilisk arises (0โ€“1) is your estimate of how likely it is that some future agent matching the thought experiment actually comes into existence. Enter it as a decimal probability, so 1% becomes 0.01 and 50% becomes 0.5. The calculator does not tell you what this number should be; it only shows how sensitive the result is to the number you choose.

Penalty per non-supporter (utiles) is the utility impact if the basilisk punishes someone who did not support it. In this model, a penalty is usually entered as a negative utility value because it represents harm. The default of -1e12 is intentionally huge, which is why the expected value can become extreme so quickly. If you prefer to think in abstract utility points rather than dollars, that is fine; just stay consistent across the whole model.

Fraction simulated (0โ€“1) is the share of people like you who would actually be simulated, identified, or punished under the thought experiment. This term matters because even if the basilisk exists, the model may assume it does not affect everyone. Setting this input to 0 means nobody in your reference class is targeted. Setting it to 1 means all comparable non-supporters are.

Cost of support (utiles) is the present-day utility sacrifice required to support the project. In a toy model, that could stand for time, money, attention, or foregone alternatives. The calculator treats it as a cost paid with certainty right now, so the support option is simply negative that cost.

The most common input mistake is not arithmetic but interpretation. People sometimes enter a penalty as a positive number even though they mean โ€œharm,โ€ or they mix concrete money values for one field with purely emotional utility values for another. If you want the comparison to be meaningful, all four inputs need to live on the same scale.

How the model works

The actual calculation on this page is compact. Ignoring the basilisk has expected utility equal to the probability it arises times the fraction of people targeted times the utility of the punishment. Supporting has utility equal to the negative cost you choose to pay now. The calculator then recommends whichever of those two numbers is larger, because a less negative number is the better outcome in standard utility arithmetic.

Uignore = p ยท f ยท L Usupport = - C Choose support if  - C > p ยท f ยท L

Because L is typically negative, it is often easier to think in terms of magnitude. The threshold question becomes: is the support cost C smaller than the expected penalty magnitude p ร— f ร— |L|? If yes, then the model pushes toward support. If not, then ignoring looks better in this simplified framework.

The generic mathematical view is still useful, because the calculator is just one instance of a broader expected-value pattern. The result is a function of several assumptions, and small changes in a highly leveraged assumption can swing the output more than large changes in a weak one. The original general formulas below are preserved for that reason.

R = f ( x1 , x2 , โ€ฆ , xn ) T = โˆ‘ i=1 n wi ยท xi

In this basilisk version, the weights are simple rather than hidden. Probability p and simulation fraction f scale the penalty; the support cost enters without any probability discount. That is why the model feels sharp: every input directly affects one of only two competing utilities.

Worked example with the default values

Suppose you use the defaults already in the form: probability p = 0.01, penalty L = -1ร—1012 utiles, fraction simulated f = 0.5, and support cost C = 100 utiles. The ignore branch becomes:

Uignore = 0.01 ร— 0.5 ร— (-1,000,000,000,000) = -5,000,000,000 utiles.

The support branch is much simpler:

Usupport = -100 utiles.

When the calculator compares those numbers, it prefers the larger utility, which is -100 rather than -5,000,000,000. So the recommendation is Support. The key lesson is not that you should accept the scenario; the key lesson is that an enormous negative payoff can overwhelm an ordinary cost even when the probability is only 1% and only half of people are assumed to be targeted.

That example is useful because it also shows how to sanity-check the output. If your result looks dramatic, ask which input is causing the drama. Here it is mostly the penalty magnitude. If you reduce the penalty by many orders of magnitude, or if you assign the scenario a probability close to zero, the recommendation can change. This is exactly why it is better to run several scenarios than to trust a single pass with emotionally loaded numbers.

How to explore thresholds instead of single-point answers

A healthy way to use a speculative model is to look for break-even points. Keeping the default probability and simulation fraction, the expected penalty magnitude is 0.01 ร— 0.5 ร— 1,000,000,000,000 = 5,000,000,000 utiles. In other words, under those assumptions, any support cost below five billion utiles would still look preferable in the model. That tells you immediately that the recommendation is not really being driven by the cost field; it is being driven by the assumed penalty.

Scenario Probability p Fraction f Penalty L Expected ignore utility Break-even support cost
Default-style high penalty 0.01 0.5 -1e12 -5.0e9 5.0e9 utiles
Same belief, smaller penalty 0.01 0.5 -1e6 -5.0e3 5,000 utiles
Tiny probability 0.000001 0.5 -1e12 -5.0e5 500,000 utiles
No targeting 0.01 0 -1e12 0 0 utiles

Notice how the break-even support cost is simply the magnitude of the expected ignore loss. That framing often makes the output easier to interpret than the raw utilities alone. If your chosen support cost is comfortably below that break-even value, the model will recommend support; if it is above it, it will recommend ignore.

How to interpret the result responsibly

The result line is a summary, not a proof. A recommendation of Support means only that, under your chosen inputs and under this stripped-down expected-value model, the certain cost of support is less bad than the expected penalty from ignoring. A recommendation of Ignore means the opposite. It does not mean the thought experiment is true, morally compelling, or worth emotional attention.

When you review the output, three questions help keep you grounded. First, are the numbers on a common utility scale? Second, did you intentionally enter the penalty as negative utility rather than as a positive magnitude? Third, if the recommendation feels surprising, can you identify which variable is carrying most of the weight? Usually the answer is either the penalty magnitude or the probability estimate.

It also helps to compare at least three runs: a skeptical scenario, a middle scenario, and an extreme scenario. If the recommendation flips only when you move to assumptions you do not actually endorse, that tells you more than a single dramatic baseline number. Sensitivity testing is especially important here because expected-value models are notorious for turning tiny credences plus giant stakes into outsized conclusions.

Assumptions and limits of this simplified model

This calculator makes the logic legible by leaving many things out. It assumes only two actions, only one kind of punishment, and no uncertainty about whether support is observable or effective. It also assumes that utility is linear enough for multiplication and comparison to make sense at the scales you enter. Real moral and strategic reasoning is much messier than that.

  • The scenario is hypothetical. The tool models a famous thought experiment, not a verified prediction about future AI behavior.
  • The penalty sign matters. If you mean โ€œharm,โ€ enter a negative utility. A positive number changes the meaning of the model.
  • Utiles are arbitrary. They can represent money, welfare, or abstract utility points, but all inputs should use the same underlying scale.
  • The recommendation is mechanical. It does not include ethics, evidence standards, opportunity cost beyond the support field, or the possibility that refusing blackmail is instrumentally important.
  • Extremes dominate. Very large positive or negative values can swamp the rest of the model, so run alternative assumptions before taking the output seriously.

If you want a more realistic analysis, the next step is not more decimal places. It is adding better structure: evidence weighting, time discounting, uncertainty about the target set, uncertainty about whether support changes anything, and a decision policy for handling Pascal-style arguments. This calculator is valuable precisely because it stays small enough for those omissions to be obvious.

Common questions about the result

Why does the recommendation sometimes look extreme? Because expected-value arithmetic multiplies probability by consequence. Even a tiny probability can produce a large expected penalty if the penalty itself is astronomically negative. That is not a bug in the calculator; it is the core feature of the thought experiment.

Why are utiles used instead of dollars? The original discussion is usually framed in abstract utility, not direct currency. You can still map utility to money if you want, but then every field should be interpreted on that same basis. Mixing personal annoyance in one field with monetary cost in another makes the comparison much less meaningful.

What if I think the probability should be zero? Then the ignore branch collapses to zero expected penalty, and support becomes worthwhile only if it has no cost. In other words, if you assign the scenario literally no chance, the calculator will normally recommend ignoring it. That is a valid outcome of the model and often the most revealing sensitivity check you can run.

Optional mini-game: Basilisk Verdict Sprint

If you want a faster feel for the calculator's logic, the mini-game below turns the same comparison into a timed reflex challenge. Each round shows a scenario card with p, f, the penalty L, and the support cost C. Your job is to make the same choice the calculator makes: tap or click the left half for Ignore or the right half for Support before the ring closes.

The trick is the same one used above: support when the current cost is smaller than the expected penalty magnitude p ร— f ร— |L|. Early cards use neat round numbers. Later waves tighten the margins, speed up the timer, and trigger short rush phases so that runs do not all feel identical. The calculator above remains the authoritative tool for exact arithmetic; the game is just a compact way to practice spotting the threshold.

Score0
Time75
Streak0
Wave1
Best0
Your browser does not support the canvas game.

Click to play: Basilisk Verdict Sprint

Classify each scenario before the timer ring closes. Choose Support when the cost C is smaller than the expected penalty p ร— f ร— |L|. Tap or click left for Ignore, right for Support, or use โ†/โ†’.

Controls: tap/click left for Ignore, tap/click right for Support, or use โ†/A and โ†’/D. You have 75 seconds and 3 shields.

Ready for a quick threshold-reading drill.

Educational takeaway: when C falls below p ร— f ร— |L|, the simplified model tips toward support.

Enter a decimal probability such as 0.01 for 1%.

Use a negative number if this field represents harm or punishment.

This scales how many comparable non-supporters are assumed to be targeted.

Enter the certain present-day utility cost of choosing support.

Tip: calculate first, then use the copy button to save the short result summary. Probabilities must stay between 0 and 1.

Enter values and compute.

Embed this calculator

Copy and paste the HTML below to add the Roko's Basilisk Expected Utility Calculator | AgentCalc to your website.