1. Frontier language models are accurate enough for post-event adjudication
Published research in 2025 established a result that the broader market has not yet priced in. Given retrieval access to mainstream sources, frontier language models match human resolution on roughly 95% of disputed Polymarket markets. The work by Capponi et al. (2025) is the canonical reference. Follow-on studies since then have replicated the finding across different market categories (political, financial, sports, entertainment) with consistent results. What matters for Pelion. The failure cases are characterizable. Genuinely inconclusive evidence, contradicting primary sources, questions whose resolution criteria cannot be cleanly applied to what happened. These are exactly the cases where any adjudicator fails, not specifically a model-based one. Forced-binary oracles are silently wrong on these questions. A protocol that treats them asUNRESOLVABLE is explicitly right.
The 5% residual is not a reason to not ship. It is the specific reason UNRESOLVABLE is a first-class outcome in Pelion’s design.