Verification willingness audit

Earthly Promise Test Field

Put concrete God-claims on a field and see whether they are actually being exposed to ordinary checks, or kept safe by flexible explanations after the fact.

How to use this field

Choose a claim, pick the strongest test you would actually allow, then set whether a clean failure would lower your confidence. The score and promise map move right when the claim is made clear, risky, public, and checkable.

Select escape hatches only if you would really use them. Each one moves the claim back left because it makes the claim harder to lose.

Active promise ? Answered prayer Prayer produces outcomes beyond ordinary timing and recovery.

Test strength ? Anecdote Barely moves the claim.

Escape hatches ? 0 active No protective responses selected.

Field position ? Protected Protected from ordinary checks.

Active promise

Answered prayer

Prayer changes outcomes beyond ordinary timing, recovery, or coincidence.

9 / 100

Current test: Anecdote Choose a promise

8/100 Study ?

35% Run ?

25% Miss ?

0 Drag ?

Evidence gradient

From story to robust science

The score moves right as the claim accepts clearer outcomes, full miss-counting, comparison groups, blinding, preregistration, replication, and large samples.

01 Anecdote

Memorable hit, little visibility into misses.

02 Complete log

Requests, dates, outcomes, and misses are kept.

03 Records

Medical, public, or timestamped data replace memory.

04 Matched comparison

Similar cases are compared instead of isolated.

05 Blinded rules

Outcomes and scoring are set before results are known.

06 Large replicated study

Very large samples, independent teams, and repeated checks.

Bad study / better study

Healing Bad: testimony after treatment

Only the dramatic recovery is shown, with no original diagnosis, treatment history, or miss count.

Better: consecutive case review

Every submitted case includes records, baseline severity, treatment, prayer timing, and independent scoring.

Prayer Bad: one answered request

A memorable hit is counted while similar unanswered requests disappear from the sample.

Better: logged requests with controls

Requests are timestamped, misses stay in, comparable cases are matched, and reviewers do not know prayer status.

Prophecy Bad: vague impressions after the event

The meaning is settled after the outcome, so almost any later event can be made to fit.

Better: locked prediction registry

Predictions are dated, specific, scored by fixed rules, and compared against ordinary forecasting baselines.

Behavior Bad: changed-life stories

The group reports its own best examples without checking comparable communities or public outcomes.

Better: matched public records

Crime, divorce, obesity, giving, safeguarding, and restitution are compared with controls for culture and demographics.

Step 1

Choose a promise

Each icon represents a common earthly claim about a personal God. Select one, then decide how much real-world testing it is allowed to face.

Claim-type presets

Use a preset when the debate starts with a broad claim such as healing, protection, wisdom, prophecy, or moral transformation.

Step 2

Choose a study

Can the promise survive this degree of scrutiny?

These are intentionally ordinary starting points: logs, records, matched comparisons, blinded review, and small pilots before large studies. The point is not to demand perfection; it is to see whether the claim is allowed to meet normal public checks.

Dataset leads

Concrete datasets

Active promise Answered prayer

Use these as places to look for measurable outcomes, baseline rates, matched comparison groups, and possible confounders before treating a story as evidence.

Confounder checklist

Ordinary causes to control

Active promise Answered prayer

Commitment

How much risk will the claim take?

A test only matters if the claim is allowed to be clear beforehand and disappointed afterward.

Claim wording ?

Willingness to run the selected test ? 35%

Low values keep the score protected even if a strong study is listed.

Would a clean failure count? ? 25%

Move this right only if a fair miss would genuinely count against the claim.

Custom study

Build a test protocol

Open this when the prepared studies miss the exact earthly claim being tested.

Define the outcome, comparison, miss rule, and rigor details before using this as the active study.

Outcome

Comparison group

Sample size

Blinding

Preregistered rules

Replication

What would count as a miss?

Custom rigor: 18 / 100

Outcome rules

Set the result rules first

The more specific these rules are before the test, the harder it is to rename every result as a win after the data arrive.

Counts for the claim

Counts against the claim

Counts as neutral

Step 3

Change your mind

What result would change your mind?

Write this before choosing excuses. It should be concrete enough that another person could tell whether the result happened.

Pre-commitment statement

Step 4

Name excuses

Select the responses a defender could use after a poor result. The warning is simple: if every result can be explained away, the claim has left the field.

Step 5

Read result

Protected from ordinary checks

This claim is mostly being held away from tests that could make confidence go down.

Personal and active in earthly life?

This result asks whether the current stance treats the promise as a real-world claim about a personal God acting in earthly life, or keeps it insulated from ordinary feedback.

Study Anecdote

Willingness 35%

Clean failure 25%

Excuse drag 0

Position 0 / 100

How this score is calculated

This is an ordinal teaching score, not a probability. It combines study rigor, willingness to run the test, willingness to let a clean miss count, and escape-hatch drag. The cutoffs are practical benchmarks, so the direction of movement matters more than any single exact point.

Study rigor 8/100

Run multiplier 0.47

Miss multiplier 0.40

Excuse drag 0

Raw score 9.4

Formula: clamp and round 8 + study x run x miss x 0.92 - drag. Thresholds: below 35 protected; 35-49 near benchmark; 50-71 testable; 72+ exposed to clean disconfirmation.

All promises

Suite-wide stance result

This checks whether the full set of promises is being handled consistently. It summarizes every promise's score, study choice, willingness to count misses, and escape hatches. Running it also switches the AI Review prompt to all promises.

Click Analyze all promises to generate a suite-wide diagnosis and prepare the AI Review prompt for every promise at once.

Promise field map

All promises on one field

Compare every promise at once: higher points are more exposed to checks, larger points use stronger studies, and darker rings show escape hatches that still protect the claim.

Point size = study strength Dark ring = protected by excuses 50 = falsifiability threshold Yellow halo = active promise

Promise	Score	Study	Miss	Excuses	Stance

Printable report

Classroom / debate-prep report

Questions and answers

Earthly Promise Test Field Q&A

Open this for questions about this field's promises, thresholds, evidence gradient, study choices, confounders, escape hatches, and pre-commitments.

Does this tool assume God does not exist?

No. The tool is narrower than that. It asks whether an earthly promise is being treated as a public claim about what happens in the world, or as a protected interpretation that cannot be disappointed by outcomes.

A user can believe in God while still admitting that a particular claim about healing, protection, prophecy, guidance, or behavior has not been exposed to a strong test. The tool is about the posture toward verification, not a final metaphysical verdict.

Why focus on earthly promises instead of theology in general?

Some theological claims are not obviously measurable: ultimate purpose, moral grounding, worship, or the meaning of suffering may not produce a clean public prediction. This field focuses on claims that do point outward: people are healed, prayers change outcomes, believers are protected, guidance improves decisions, or communities become visibly better.

Once a claim says that something happens here, ordinary questions become reasonable. How often does it happen? Compared with what? What would count as a miss? Could a similar pattern arise without the supernatural explanation?

Are anecdotes useless?

Anecdotes are not useless. They can suggest what to study, reveal what people care about, and sometimes expose a claim that deserves a closer look. But an anecdote is weak evidence for a general promise because it usually hides the denominator: the misses, ordinary recoveries, ambiguous cases, and comparable stories from other groups.

The score lets anecdotes move a claim a little because they are not nothing. They just do not move it far unless they become logs, records, comparisons, preregistered outcomes, and repeated checks.

Is this scientism?

Not if the claim being discussed is an earthly effect. The tool does not say that all meaning, morality, beauty, or metaphysics must be settled by a laboratory. It says that claims about observable outcomes should be willing to face the kinds of checks appropriate to observable outcomes.

If a defender says the claim is not about measurable outcomes, that may be a coherent theological retreat. But then the claim should stop being advertised as evidence that a personal God is actively producing distinctive public effects in earthly life.

What does falsifiable mean here?

In this tool, a claim is more falsifiable when the user can say in advance what result would count against it, and then allow that result to matter. Falsifiability does not require perfect certainty or one decisive experiment. It requires real risk.

A claim becomes less falsifiable when every possible result can be absorbed: healing confirms God, no healing confirms God's mysterious will, delayed healing confirms God's timing, and ordinary outcomes are still counted as equally confirming. At that point the language may sound empirical while the structure is protected.

Why is the falsifiability threshold somewhat arbitrary?

There is no natural law that says a claim becomes meaningfully testable at exactly one score. The threshold is a practical benchmark for a teaching tool. It marks the region where the user has accepted enough clarity, comparison, miss-counting, and outcome risk that evidence can begin to bite.

The exact line is less important than the direction of movement. A claim that moves right is accepting more public accountability. A claim that stays left is being protected from the very outcomes it invokes.

What if God is free and does not perform on command?

A free agent does not have to perform on command. But that response changes the evidential claim. If a promise is advertised as a real earthly pattern, such as believers being healed, protected, guided, or morally transformed, then it is fair to ask whether the pattern appears more often than it does in comparable cases.

If the answer is that God may act or not act in any case, for any hidden reason, with no expected difference in public outcomes, the claim may still be devotional. It is just no longer functioning as a testable claim about a distinctive earthly effect.

Does "do not test God" end the discussion?

It can end one kind of discussion: the defender may refuse to submit the claim to testing. But it also changes what the claim can do rhetorically. A claim cannot fairly be used as public evidence while also being shielded from ordinary public checks whenever those checks become inconvenient.

The tool treats "do not test God" as an escape hatch because it often appears after a concrete claim has already been made. If the claim was never meant as evidence, that should be stated plainly at the beginning.

What if the effect only happens for sincere believers?

That can be tested only if sincerity is defined before the outcome is known. If sincerity is judged afterward, then every failure can be removed from the sample by saying the person, prayer, church, or observer was not sincere enough.

A fairer version would define inclusion criteria first: frequency of practice, stated belief, community membership, prayer behavior, or another measurable marker. Then the study keeps all eligible cases, including disappointing ones.

Why are comparison groups so important?

Many outcomes happen without the claimed supernatural cause. People recover from illness, avoid accidents, make good decisions, receive help, and become kinder for ordinary reasons. A comparison group asks whether the promised effect happens more than ordinary life already produces.

Without comparison, a hit can feel extraordinary simply because the baseline is invisible. With comparison, the question becomes cleaner: did the believers, pray-ers, or claimed recipients do better than similar people facing similar conditions?

How should dataset leads be used?

Dataset leads are starting points, not proof. They point toward records that could make the claim less dependent on memory and testimony: medical records, public-health data, prediction registries, crime statistics, divorce records, mortality records, or documented request logs.

A dataset becomes useful when paired with a clear outcome, a comparison group, confounder controls, and a rule for what would count against the claim. Raw data without a question can still be cherry-picked.

What makes a proposed study fair rather than hostile?

A fair study aims the test at the actual claim, uses outcomes the defender agrees are relevant, and controls ordinary causes without building in a skeptical conclusion. It should not demand impossible access or a standard unrelated to the promise.

A fair study also lets failure count. If the study is accepted only when it succeeds, but reinterpreted as spiritually invalid when it disappoints, the problem is not hostility. The problem is that the claim was never really on the field.

What if results are mixed?

Mixed results are common. That is why the tool includes neutral outcomes. A study might be underpowered, poorly executed, confounded, or ambiguous. In that case the best answer may be: this test did not settle the matter.

But "mixed" should not automatically mean "confirmed." The user should ask whether the pattern is stronger than ordinary baselines, whether better studies reproduce it, and whether misses are being counted with the same seriousness as hits.

How should the "what would change your mind" step be written?

A strong pre-commitment names the outcome, comparison, deadline, and expected direction. For example: "If a preregistered comparison of matched patients shows no recovery advantage for targeted prayer, that would lower my confidence in this prayer-healing claim."

A weak pre-commitment says only that the result would need to be "convincing" or "fair." Those words can be moved later. The goal is not to force belief change from one study; it is to prevent the goalposts from moving after the result is known.

How can this be used in a classroom or debate?

Start by asking the defender to choose one concrete promise and write it in earthly terms. Then have them choose a study they would actually allow, define what would count for and against the claim, and write what result would lower confidence before any escape hatches are selected.

The most productive discussion is usually not "Does God exist?" but "What is this specific claim allowed to risk?" The printable report can preserve the commitments so later discussion can focus on consistency rather than memory.

AI review

Comprehensive AI prompt

Prompt scope: Active promise

Use this prompt with another AI to ask for a deeper analysis of the current claim, study choice, willingness settings, and escape hatches.