Blog

Findings from the pipeline.
The post-mortems you'd otherwise have to write yourself.

Every quirk we've documented in building this dataset. No marketing fluff — just the actual API misbehaviors, edge cases, and reconciliation methods we ran into. If you've ever built a prediction-market scraper, you've probably hit at least one of these.

Manifold ~7 min read 2026-05-02

How Manifold's `volume` field actually aggregates — and why your scraper is probably wrong

The volume field on a Manifold market does not equal SUM(bet.amount). It also doesn't equal SUM(abs(bet.amount)), except sometimes. The truth is in the per-bet fills[] array, and the public /v0/bets endpoint only surfaces the top level. Here's the reproduction, the drift quantified, and the workaround.

Polymarket ~9 min read 2026-05-02

The Polymarket data-api 3,500-trade offset cap, and how we busted it on-chain

The Polymarket data-api /trades endpoint silently returns 400 above offset 3,500. For markets like Trump 2024 ($1.5B volume, 5M+ trades) you can recover ~0.07% of the real history through REST. We map the way out: Goldsky's orderbook subgraph for the standard CTFExchange, plus direct Polygon eth_getLogs for the NegRisk-wrapped markets the subgraph doesn't index. Code + benchmarks.

Kalshi ~5 min read 2026-05-02

Kalshi reconciliation: 100% on the public API (here's how we measured it)

Of the three venues we cover, Kalshi is the cleanest by a wide margin — 50/50 markets reconciled to 0.0% drift on the Phase-0 sample. What that actually means, how the sample was selected, the v2 API field rename gotcha (volume_fp vs volume), and why this matters for academic backtests.

Methodology ~10 min read 2026-05-02

Survivorship bias in prediction-market backtests: a 30-line reproduction

Every prediction-market scraper drops markets when the venue archives them. The result: backtests run on "current" data are systematically biased toward strategies that traded markets which never resolved badly. A 30-line proof, the magnitude of the effect on a 2024 sample, and the schema-level fix (deletion ledger).

Cross-venue ~8 min read 2026-05-02

Cross-venue calibration: Manifold vs Kalshi on the same Fed events

Manifold (play-money) is widely cited as well-calibrated; Kalshi (real-money, CFTC-regulated) shows favorite-longshot bias in some categories. What does the same FOMC rate decision look like on both venues? Pricing spread, basis decay, and a back-of-envelope arb. Replication code included.

Want one of these earlier? Reply to hello@pred-markets.com with which one and we'll send the draft.

Findings from the pipeline.The post-mortems you'd otherwise have to write yourself.

How Manifold's volume field actually aggregates — and why your scraper is probably wrong

The Polymarket data-api 3,500-trade offset cap, and how we busted it on-chain

Kalshi reconciliation: 100% on the public API (here's how we measured it)

Survivorship bias in prediction-market backtests: a 30-line reproduction

Cross-venue calibration: Manifold vs Kalshi on the same Fed events

Findings from the pipeline.
The post-mortems you'd otherwise have to write yourself.

How Manifold's `volume` field actually aggregates — and why your scraper is probably wrong