Verification willingness audit

Earthly Promise Test Field

Put concrete God-claims on a field and see whether they are actually being exposed to ordinary checks, or kept safe by flexible explanations after the fact.

How to use this field

Choose a claim, pick the strongest test you would actually allow, then set whether a clean failure would lower your confidence. The score and promise map move right when the claim is made clear, risky, public, and checkable.

Select escape hatches only if you would really use them. Each one moves the claim back left because it makes the claim harder to lose.

Active promise ? This is the promise currently being edited. Other promises keep their own settings, so you can compare different claims at once. Answered prayer Prayer produces outcomes beyond ordinary timing and recovery.
Test strength ? Stronger tests use public records, clear outcomes, comparison groups, blinding, pre-set rules, or repeated trials. Anecdote Barely moves the claim.
Escape hatches ? Escape hatches are reasons that make misses unable to count. They may be sincere, but they reduce the claim's exposure to evidence. 0 active No protective responses selected.
Field position ? This shows how far right the selected promise has moved. The threshold line is a practical benchmark, not a natural boundary; the exact cutoff is somewhat arbitrary, but moving right means stronger exposure to tests that could disappoint the claim. Protected Protected from ordinary checks.

Active promise

Answered prayer

Prayer changes outcomes beyond ordinary timing, recovery, or coincidence.

9 / 100
Current test: Anecdote Choose a promise
8/100 Study ? The scientific rigor of the selected study. This is an ordinal rubric estimate based on outcome clarity, miss-counting, comparison groups, blinding, preregistration, sample size, and replication.
35% Run ? How willing the user is to actually submit this promise to the selected test. Lower values keep even a strong study from moving the claim very far.
25% Miss ? Whether a clean, fair failure would count against the promise. If no miss would matter, the claim is being protected from falsification.
0 Drag ? The leftward pull from selected escape hatches. More excuses mean poor results are easier to dismiss, so the claim becomes less testable.

Evidence gradient

From story to robust science

The score moves right as the claim accepts clearer outcomes, full miss-counting, comparison groups, blinding, preregistration, replication, and large samples.

01 Anecdote

Memorable hit, little visibility into misses.

02 Complete log

Requests, dates, outcomes, and misses are kept.

03 Records

Medical, public, or timestamped data replace memory.

04 Matched comparison

Similar cases are compared instead of isolated.

05 Blinded rules

Outcomes and scoring are set before results are known.

06 Large replicated study

Very large samples, independent teams, and repeated checks.

Bad study / better study

? These pairs show why rigor matters. The better version does not need to be perfect; it simply makes hits, misses, ordinary causes, and comparison cases visible.
Healing Bad: testimony after treatment

Only the dramatic recovery is shown, with no original diagnosis, treatment history, or miss count.

Better: consecutive case review

Every submitted case includes records, baseline severity, treatment, prayer timing, and independent scoring.

Prayer Bad: one answered request

A memorable hit is counted while similar unanswered requests disappear from the sample.

Better: logged requests with controls

Requests are timestamped, misses stay in, comparable cases are matched, and reviewers do not know prayer status.

Prophecy Bad: vague impressions after the event

The meaning is settled after the outcome, so almost any later event can be made to fit.

Better: locked prediction registry

Predictions are dated, specific, scored by fixed rules, and compared against ordinary forecasting baselines.

Behavior Bad: changed-life stories

The group reports its own best examples without checking comparable communities or public outcomes.

Better: matched public records

Crime, divorce, obesity, giving, safeguarding, and restitution are compared with controls for culture and demographics.

Step 2

Choose a study

? These studies are not meant to be impossible standards. They are ordinary ways to ask whether an earthly claim can survive records, comparison groups, hidden scoring, and visible misses.

Can the promise survive this degree of scrutiny?

These are intentionally ordinary starting points: logs, records, matched comparisons, blinded review, and small pilots before large studies. The point is not to demand perfection; it is to see whether the claim is allowed to meet normal public checks.

Dataset leads

Concrete datasets

? These are leads for the kinds of records that could check the claim. A dataset lead is not proof by itself; it becomes useful when paired with clear outcomes, fair comparison groups, controls for ordinary causes, and a rule for what would count against the promise.
Active promise Answered prayer

Use these as places to look for measurable outcomes, baseline rates, matched comparison groups, and possible confounders before treating a story as evidence.

Confounder checklist

Ordinary causes to control

? A confounder is an ordinary factor that could produce the same pattern without the supernatural claim being true. A fair test tries to separate these before interpreting the result.
Active promise Answered prayer

    Commitment

    How much risk will the claim take?

    A test only matters if the claim is allowed to be clear beforehand and disappointed afterward.

    35%

    Low values keep the score protected even if a strong study is listed.

    25%

    Move this right only if a fair miss would genuinely count against the claim.

    Custom study

    Build a test protocol

    Open this when the prepared studies miss the exact earthly claim being tested.

    Define the outcome, comparison, miss rule, and rigor details before using this as the active study.

    Custom rigor: 18 / 100

    Outcome rules

    Set the result rules first

    ? Result rules prevent after-the-fact rescue. Before seeing outcomes, say what would support the claim, what would count against it, and what would simply be too messy or underpowered to interpret.

    The more specific these rules are before the test, the harder it is to rename every result as a win after the data arrive.

    Step 3

    Change your mind

    ? This is the pre-commitment. Before escape hatches are selected, name the result that would actually lower confidence. Otherwise the claim can be protected after every disappointing outcome.

    What result would change your mind?

    Write this before choosing excuses. It should be concrete enough that another person could tell whether the result happened.

    Step 4

    Name excuses

    Select the responses a defender could use after a poor result. The warning is simple: if every result can be explained away, the claim has left the field.

    Step 5

    Read result

    ? The summary uses your current choices. Change the study, sliders, or escape hatches and the diagnosis will update immediately.
    Protected from ordinary checks

    This claim is mostly being held away from tests that could make confidence go down.

    Personal and active in earthly life?

    This result asks whether the current stance treats the promise as a real-world claim about a personal God acting in earthly life, or keeps it insulated from ordinary feedback.

    Study Anecdote
    Willingness 35%
    Clean failure 25%
    Excuse drag 0
    Position 0 / 100
    How this score is calculated

    This is an ordinal teaching score, not a probability. It combines study rigor, willingness to run the test, willingness to let a clean miss count, and escape-hatch drag. The cutoffs are practical benchmarks, so the direction of movement matters more than any single exact point.

    Study rigor 8/100
    Run multiplier 0.47
    Miss multiplier 0.40
    Excuse drag 0
    Raw score 9.4

    Formula: clamp and round 8 + study x run x miss x 0.92 - drag. Thresholds: below 35 protected; 35-49 near benchmark; 50-71 testable; 72+ exposed to clean disconfirmation.

    All promises

    Suite-wide stance result

    ? This aggregates every promise, not just the active one. It is useful when a person says many earthly promises are real, but only some are allowed to face robust checks or count clean failures.

    This checks whether the full set of promises is being handled consistently. It summarizes every promise's score, study choice, willingness to count misses, and escape hatches. Running it also switches the AI Review prompt to all promises.

    Click Analyze all promises to generate a suite-wide diagnosis and prepare the AI Review prompt for every promise at once.

    Promise field map

    All promises on one field

    ? Each point is one earthly promise. Height is the current field score: a 0-100 summary of study strength, willingness to run the test, willingness to count a clean failure, and any escape hatches still selected.

    Compare every promise at once: higher points are more exposed to checks, larger points use stronger studies, and darker rings show escape hatches that still protect the claim.

    Point size = study strength Larger points mean the selected study uses stronger controls, clearer outcomes, larger samples, blinding, preregistration, or replication. Dark ring = protected by excuses A darker ring means selected escape hatches are pulling the promise left by making poor results harder to count as real misses. 50 = falsifiability threshold The 50 line is a practical benchmark, not a natural boundary. It marks the point where a fair negative result begins to matter. Yellow halo = active promise The yellow glow marks the promise currently being edited in the controls above. Click another point to make that promise active.
    Promise Score Study Miss Excuses Stance

    Printable report

    Classroom / debate-prep report

    ? This view turns the current audit into a concise handout. Use Print for a classroom copy, or Copy report to paste it into notes, email, or a shared document.

    Questions and answers

    Earthly Promise Test Field Q&A

    Open this for questions about this field's promises, thresholds, evidence gradient, study choices, confounders, escape hatches, and pre-commitments.

    Does this tool assume God does not exist?

    No. The tool is narrower than that. It asks whether an earthly promise is being treated as a public claim about what happens in the world, or as a protected interpretation that cannot be disappointed by outcomes.

    A user can believe in God while still admitting that a particular claim about healing, protection, prophecy, guidance, or behavior has not been exposed to a strong test. The tool is about the posture toward verification, not a final metaphysical verdict.

    Why focus on earthly promises instead of theology in general?

    Some theological claims are not obviously measurable: ultimate purpose, moral grounding, worship, or the meaning of suffering may not produce a clean public prediction. This field focuses on claims that do point outward: people are healed, prayers change outcomes, believers are protected, guidance improves decisions, or communities become visibly better.

    Once a claim says that something happens here, ordinary questions become reasonable. How often does it happen? Compared with what? What would count as a miss? Could a similar pattern arise without the supernatural explanation?

    Are anecdotes useless?

    Anecdotes are not useless. They can suggest what to study, reveal what people care about, and sometimes expose a claim that deserves a closer look. But an anecdote is weak evidence for a general promise because it usually hides the denominator: the misses, ordinary recoveries, ambiguous cases, and comparable stories from other groups.

    The score lets anecdotes move a claim a little because they are not nothing. They just do not move it far unless they become logs, records, comparisons, preregistered outcomes, and repeated checks.

    Is this scientism?

    Not if the claim being discussed is an earthly effect. The tool does not say that all meaning, morality, beauty, or metaphysics must be settled by a laboratory. It says that claims about observable outcomes should be willing to face the kinds of checks appropriate to observable outcomes.

    If a defender says the claim is not about measurable outcomes, that may be a coherent theological retreat. But then the claim should stop being advertised as evidence that a personal God is actively producing distinctive public effects in earthly life.

    What does falsifiable mean here?

    In this tool, a claim is more falsifiable when the user can say in advance what result would count against it, and then allow that result to matter. Falsifiability does not require perfect certainty or one decisive experiment. It requires real risk.

    A claim becomes less falsifiable when every possible result can be absorbed: healing confirms God, no healing confirms God's mysterious will, delayed healing confirms God's timing, and ordinary outcomes are still counted as equally confirming. At that point the language may sound empirical while the structure is protected.

    Why is the falsifiability threshold somewhat arbitrary?

    There is no natural law that says a claim becomes meaningfully testable at exactly one score. The threshold is a practical benchmark for a teaching tool. It marks the region where the user has accepted enough clarity, comparison, miss-counting, and outcome risk that evidence can begin to bite.

    The exact line is less important than the direction of movement. A claim that moves right is accepting more public accountability. A claim that stays left is being protected from the very outcomes it invokes.

    What if God is free and does not perform on command?

    A free agent does not have to perform on command. But that response changes the evidential claim. If a promise is advertised as a real earthly pattern, such as believers being healed, protected, guided, or morally transformed, then it is fair to ask whether the pattern appears more often than it does in comparable cases.

    If the answer is that God may act or not act in any case, for any hidden reason, with no expected difference in public outcomes, the claim may still be devotional. It is just no longer functioning as a testable claim about a distinctive earthly effect.

    Does "do not test God" end the discussion?

    It can end one kind of discussion: the defender may refuse to submit the claim to testing. But it also changes what the claim can do rhetorically. A claim cannot fairly be used as public evidence while also being shielded from ordinary public checks whenever those checks become inconvenient.

    The tool treats "do not test God" as an escape hatch because it often appears after a concrete claim has already been made. If the claim was never meant as evidence, that should be stated plainly at the beginning.

    What if the effect only happens for sincere believers?

    That can be tested only if sincerity is defined before the outcome is known. If sincerity is judged afterward, then every failure can be removed from the sample by saying the person, prayer, church, or observer was not sincere enough.

    A fairer version would define inclusion criteria first: frequency of practice, stated belief, community membership, prayer behavior, or another measurable marker. Then the study keeps all eligible cases, including disappointing ones.

    Why are comparison groups so important?

    Many outcomes happen without the claimed supernatural cause. People recover from illness, avoid accidents, make good decisions, receive help, and become kinder for ordinary reasons. A comparison group asks whether the promised effect happens more than ordinary life already produces.

    Without comparison, a hit can feel extraordinary simply because the baseline is invisible. With comparison, the question becomes cleaner: did the believers, pray-ers, or claimed recipients do better than similar people facing similar conditions?

    How should dataset leads be used?

    Dataset leads are starting points, not proof. They point toward records that could make the claim less dependent on memory and testimony: medical records, public-health data, prediction registries, crime statistics, divorce records, mortality records, or documented request logs.

    A dataset becomes useful when paired with a clear outcome, a comparison group, confounder controls, and a rule for what would count against the claim. Raw data without a question can still be cherry-picked.

    What makes a proposed study fair rather than hostile?

    A fair study aims the test at the actual claim, uses outcomes the defender agrees are relevant, and controls ordinary causes without building in a skeptical conclusion. It should not demand impossible access or a standard unrelated to the promise.

    A fair study also lets failure count. If the study is accepted only when it succeeds, but reinterpreted as spiritually invalid when it disappoints, the problem is not hostility. The problem is that the claim was never really on the field.

    What if results are mixed?

    Mixed results are common. That is why the tool includes neutral outcomes. A study might be underpowered, poorly executed, confounded, or ambiguous. In that case the best answer may be: this test did not settle the matter.

    But "mixed" should not automatically mean "confirmed." The user should ask whether the pattern is stronger than ordinary baselines, whether better studies reproduce it, and whether misses are being counted with the same seriousness as hits.

    How should the "what would change your mind" step be written?

    A strong pre-commitment names the outcome, comparison, deadline, and expected direction. For example: "If a preregistered comparison of matched patients shows no recovery advantage for targeted prayer, that would lower my confidence in this prayer-healing claim."

    A weak pre-commitment says only that the result would need to be "convincing" or "fair." Those words can be moved later. The goal is not to force belief change from one study; it is to prevent the goalposts from moving after the result is known.

    How can this be used in a classroom or debate?

    Start by asking the defender to choose one concrete promise and write it in earthly terms. Then have them choose a study they would actually allow, define what would count for and against the claim, and write what result would lower confidence before any escape hatches are selected.

    The most productive discussion is usually not "Does God exist?" but "What is this specific claim allowed to risk?" The printable report can preserve the commitments so later discussion can focus on consistency rather than memory.

    AI review

    Comprehensive AI prompt

    ? This prompt packages the current audit so another AI can critique the stance, look for hidden escape hatches, suggest stronger tests, and ask whether the user's position fits a personal God acting in earthly life.
    Prompt scope: Active promise

    Use this prompt with another AI to ask for a deeper analysis of the current claim, study choice, willingness settings, and escape hatches.