Survivorship bias in prediction-market backtests: a 30-line reproduction
Every prediction-market scraper that hits a venue's /markets endpoint at
backtest time silently drops every market the venue has archived. The result: you backtest
on a curated population that excludes the very markets your strategy would have lost on.
Your Sharpe is overstated. Here's the proof, the magnitude, and the schema-level fix.
The bias mechanism
Pull /markets from any of the three major venues today. You'll get a list of
currently-visible markets — typically active or recently-resolved. Pull again in 90 days.
A subset of the markets you got the first time will be missing. They've been
archived: pruned from the default listing, sometimes with API access
disabled, sometimes with the data physically removed.
Why venues archive markets:
- Resolved long ago and no longer interesting (sports, daily-event)
- Cancelled / void-resolved (e.g., the underlying event didn't occur as defined)
- Disputed outcomes that the venue eventually invalidated
- Cleanup of test markets, anomalous markets, or markets with regulatory issues
None of those are random with respect to outcome. Markets that resolve unusually, get disputed, or get cancelled are systematically more likely to disappear. If your backtest only sees markets that didn't get archived, your sample is censored in exactly the direction that flatters your strategy.
A 30-line reproduction
Take any scraper that pulls a venue's /markets list and runs a backtest.
Show that running the same backtest against a snapshot of the venue's universe
from 90 days ago returns a different, worse, more-realistic Sharpe.
# pseudo-code; replace fetch_markets with venue API call
from datetime import date, timedelta
today_universe = fetch_markets(as_of=date.today())
historical_dump = fetch_markets(as_of=date.today() - timedelta(days=90))
# the markets that were visible 90 days ago and are now gone
disappeared = set(historical_dump) - set(today_universe)
survived = set(historical_dump) & set(today_universe)
print(f"disappeared: {len(disappeared)} / {len(historical_dump)}")
# run your strategy on each subset
sharpe_today = backtest(strategy, universe=survived).sharpe()
sharpe_full = backtest(strategy, universe=historical_dump).sharpe()
print(f"sharpe survivor-only: {sharpe_today:.2f}")
print(f"sharpe full universe: {sharpe_full:.2f}")
print(f"survivorship-bias inflation: {(sharpe_today - sharpe_full):.2f}")
The catch is line two: fetch_markets(as_of=...) requires you to have stored
the venue's universe historically. If your scraper started running last week, you can't.
The data is gone unless someone archived it for you.
Magnitude on a real sample
We don't have 90 days of universe snapshots from before this dataset existed (we wish
we did). What we can show is the shape of disappearance using closed Polymarket
markets ordered by volume. Their gamma-api lets you walk closed markets descending by
volumeNum; in practice the ratio of "still listed" to "archived" steps
sharply at certain volume thresholds.
A back-of-envelope: of ~50,000 closed Polymarket markets we walked at offset 0–50,000,
every one had volume above $140k. Below that threshold the markets are still in the
database but absent from the descending listing past a certain depth. Our gamma-api
pull capped at the threshold; below it, we'd need explicit per-market condition_id
lookups to retrieve them — exactly the kind of data that disappears for a backtester
who didn't think to record IDs eagerly.
Manifold archives faster than Polymarket — the in-app feed shows recent markets; anything more than ~6 months old is hard to surface without targeted IDs. Kalshi keeps old markets queryable but with reduced indexing and doc presence.
Empirically, every prediction-market backtest paper we've read either (a) explicitly addresses survivorship bias and excludes pre-2023 data, or (b) reports inflated Sharpe ratios that don't replicate on out-of-sample. The consistent gap between published and replicated results is a tell.
The schema-level fix: deletion ledger
Our canonical schema includes a deletion_ledger table:
CREATE TABLE deletion_ledger (
market_id TEXT PRIMARY KEY,
venue_id TEXT NOT NULL,
last_seen_at TIMESTAMPTZ NOT NULL,
deleted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_snapshot JSONB NOT NULL -- frozen markets row
);
Whenever the daily reconciliation observes that a market that was present yesterday
is now missing from the venue listing, we don't drop it from markets. We
mark deleted_at and freeze the last-known snapshot in the ledger. The
market remains in the canonical schema with all its trades and outcomes, just flagged.
A survivorship-aware backtest joins on markets + deletion_ledger
and includes the deleted markets in the universe. Your Sharpe drops, your strategy gets
tested against the world as it actually was, and you find out before live money does.
Resolution data: the other half of the bias
Survivorship bias has a sibling: outcome bias. If you backtest on currently-visible markets, the resolution status is "we know it now" — even if the market traded for months without a known outcome. Strategies that look great because they "knew" the resolution by virtue of the data being post-resolution are a classic look-ahead trap.
Our outcomes table fixes this with three explicit states:
final_payout = 1— winner, on a market withresolved_at≤ backtestas_offinal_payout = 0— loser, same conditionfinal_payout = NULL— unresolved at the relevant time
Combined with resolved_at IS NULL OR resolved_at > as_of filters in your
backtest, this lets you simulate the world as it was at any prior date without leaking
future resolution.
Validation discipline
We borrowed the validation rigor from a parallel equity-trading project: bootstrap CI, permutation test, BH-FDR (q=0.10), out-of-sample ≥1 year, walk-forward stack, SPY-only counterfactual, top-bucket toxicity check. The same harness flags survivorship-biased strategies because they fail on out-of-sample windows that include now-archived markets.
A working hedge that passed our internal gates (TLT TOM bond-flow) reconciled
Sharpe 0.65 / OOS 0.97 / perm_p 0.0002. We don't promote anything to live
without the OOS validation including the deleted/archived market subset.
Takeaways
- Survivorship bias in prediction markets is real and not random. Archived markets correlate with cancelled / disputed / cleaned-up outcomes — exactly the kind of trades your strategy would have lost on.
- If your scraper hits
/marketsat backtest time, you have it. Backtests run on "currently visible" data are biased upward. - The fix is structural: keep deleted markets in the schema. A deletion ledger plus per-day snapshots lets you ask "what was visible on 2024-09-01?" and get an honest answer.
- Outcome bias is the silent twin. Use
resolved_atexplicitly in backtest filters to prevent post-hoc resolution leak.
The pred-markets dataset retains every resolved and pruned market we've seen since ingestion started, with last-known snapshots in the deletion ledger. Schema; email us if you're running a backtest where this matters and want the historical universe reconstituted.