Kalshi reconciliation: 100% on the public API (here's how we measured it)

Of the three prediction-market venues we currently cover, Kalshi is the cleanest by a wide margin. On the Phase-0 sample, 50 of 50 markets reconciled to 0.0% drift between captured trades and venue-reported volume. No subgraph, no RPC, no work-around — just the public REST API.

What "100% reconciled" means

Reconciliation here is a single equation:

drift_pct = |volume_native − SUM(trades.size_native FILTER size_unit = volume_unit)|
            / volume_native × 100

For each market we sum every trade's size in the market's native unit, take the absolute difference against the venue-reported total volume, and divide. Sub-1% drift = pass. We log every market's drift in recon_log daily.

Kalshi reports volume in contracts; trades report size in contracts (as count_fp). Both sides match unit-wise so the comparison is apples-to-apples.

The numbers

A 50-market spike in the Phase-0 dataset, ranked by drift:

BucketCountAvg drift
<0.01% (perfect)500.0000%
0.01–1%0
≥1%0

Aggregate pass rate: 100% (50/50). Average drift: 0.0000%. Maximum drift across the sample: 0.0000%.

To put that in context: on the same Phase-0 spike, Manifold reconciled 30/40 markets at <0.01% drift (the rest off by ~10% due to a fills[] aggregation issue), and Polymarket via the public data-api reconciled exactly 0/50 because the endpoint caps at offset 3,500 trades per market.

Kalshi: 50 of 50, zero residual.

The v2 API field-rename gotcha

Kalshi's v2 API (rolled out 2025) renamed pricing fields without keeping the old names as fallbacks. If you wrote your scraper against v1 and your v2 build silently returns None on price, this is why:

v1 fieldv2 fieldNote
volumevolume_fpcontracts, fixed-point string
liquidityliquidity_dollarsnow in USD, not cents
yes_price (cents)yes_price_dollars0.0–1.0, not 1–99
no_price (cents)no_price_dollarssame
countcount_fpcontracts

We hit this on first probe and the bug was instructive: the v1 field path returned None, so our normalizer defaulted to zero, so every trade had price=0, so every reconciliation was off by 100%. Fixed with a key-presence check (not just or-fallback, because legitimate zeros exist):

# wrong — fallback triggers on legit zero
volume = payload.get("volume_fp") or payload.get("volume")

# right — v1 path only when v2 key is missing
volume = payload["volume_fp"] if "volume_fp" in payload
       else payload.get("volume")

Why Kalshi is so clean

Three structural reasons, in descending order of importance.

1. CFTC-regulated. Kalshi is a Designated Contract Market under U.S. derivatives law. Their compliance bar around trade reporting is functionally identical to a futures exchange. Public data has to be auditable; that pressure pushes them toward one canonical truth source.

2. Single execution venue per market. Unlike Polymarket (CTFExchange + NegRiskCtfExchange + the CLOB pre-trade routing), Kalshi runs a single matching engine and a single trade record. There's nowhere else for fills to live.

3. Their volume_fp aggregator is well-specified. It's SUM(count_fp) across all OrderFilled events for the market. No fills-array wrinkle, no signed amount wrinkle, no off-chain LP wrinkle.

Caveats — not all of Kalshi is equally covered

The Phase-0 sample skews toward sports-multi-game markets that opened recently — they tend to have small numbers of fills (often a single trade with a large count_fp). That makes reconciliation trivial: there's nothing to mis-aggregate.

For active high-frequency markets (e.g., FOMC rate decisions during the 24h pre-announcement window), trade counts run into the thousands per market. The reconcile mathematically still works, but the integration test surface is larger. Phase 1 expands the sample to cover at least 5 hot windows in 2024–2025 with verified per-bucket drift.

Also worth noting: Kalshi's internal dataset (via the research.kalshi.com academic partnership) is the gold standard. We're reconciling against the public REST surface, which is what anyone without a partnership has access to. The two should be identical modulo aggregation lag, but we don't have the partnership feed for cross-check — something to confirm if a customer needs that level of attestation.

Takeaways

  1. Kalshi is the only one of the three venues you can trust on volume out of the box. No on-chain layer, no per-bet aggregation tricks. Sum the trades, match the field, ship.
  2. Watch for the v2 rename. If your scraper was written before late 2025, the silent None defaults bite hard.
  3. Treat the 100% number conservatively. It's audit-grade on the Phase-0 sample but the sample is biased toward fresh sports markets. We'll publish a per-category breakdown in Phase 1.

For research that needs reproducibility against a regulated venue, Kalshi via our reconciled archive is currently the strongest data path we offer. Email us if you're working on a calibration / favorite-longshot paper and want the dump for a specific event window.