About

Built by someone
who needed it himself.

Prediction-market data is fragmented across venues, schemas drift, resolved markets get pruned. The raw data exists — but stitching it together is weeks of nuisance work every quant in the space rediscovers from scratch. So I'm building the layer.

Operating principle

Prove. Improve. Repeat.

Borrowed from a parallel project. Three rules in one sentence: every claim is backed by an N + a test + a falsification criterion; we ship the smallest thing that proves the next claim; we do it again, weekly.

01 — Prove

Recon over rhetoric

"We have full coverage" is a claim. The recon log is the receipt. If the receipt isn't there, the claim isn't either.

02 — Improve

Ship the smallest gap closer

Polymarket data-api caps at 3,500 trades. We ship the subgraph layer. NegRisk markets aren't in the subgraph. We ship Polygon RPC. One layer at a time, each measurable.

03 — Repeat

Coverage is a moving target

Venues change APIs, contracts get redeployed, new markets come online. The reconciliation harness is how we notice. Daily, automated, audited.

Manifesto

What you should expect from us.

We publish the failures.

Manifold's volume aggregates over per-bet fills[], which our /bets endpoint can't see — that's why ~25% of active markets show ~10% drift. Polymarket's data-api caps at offset 3,500. NegRisk markets need on-chain ingestion. None of this is hidden in the contract; all of it is on the dataset page.

No vendor lock-in.

Schema is documented. Daily Parquet dumps are exportable to your S3 if you want. The Python SDK is thin enough you could rewrite it in a week. We earn our renewal by being useful, not by hostage-taking your pipelines.

Survivorship bias is the silent killer.

A backtest run on a scraper's "current" data will silently drop every resolved and pruned market. Your strategy will look better than it is. We retain everything via resolutions and deletion_ledger; backtests see the world as it actually was.

We won't oversell.

If your use case isn't well-served by our coverage today, we tell you. Phase-0 dataset is small (50 markets per venue) but truthful. We're not pretending it's 1M and hoping you don't notice.

Founder

Giulio.

40-year-old solo founder, ex-business owner in Kenya (8 years in B2B machinery sales), now building algorithmic trading and data infrastructure full-time. Parallel project: a sentiment-driven equity trading system targeting FTMO funded accounts — the same operating principles, the same validation rigor (bootstrap CI, permutation tests, walk-forward, BH-FDR), just on equity sentiment data instead of prediction markets.

pred-markets exists because every time the parallel project needed prediction-market history, the answer was "build the scraper yourself, lose a week, lose another week reconciling." So I built it once, properly, with every venue normalized into one schema. Now it's a product.

Reach me directly: hello@pred-markets.com.

Phase 0

Sample available now. Production access on rolling waitlist.

Get the sample Schema details →

Built by someonewho needed it himself.