Q3 audit of the 48 verifiable Zanii predictions published 2024-W01 through 2025-W40. 41 verified, 4 partial, 3 wrong. 85% strict verified rate. Full grading transparent and linked.
Verification window: by 2025-09-30 · confidence high
Q3 2025 Audit: Track Record Holds at 85%
The third quarterly audit confirms what the mid-year scorecard showed. Of the 48 verifiable predictions Zanii Research has published since 2024-W01, 41 have verified, 4 are partial, 3 are wrong. An 85% strict verified rate. A 93% rate if partials are included as directional wins.
This is the program's strongest performance quarter. The H1 miss rate was 18%. The Q3 miss rate is 15%. The two new wrong calls of Q3 are explained in full below.
The shape of Q3
The frontier-model line of work continued to land cleanly. Every single one of our model-release-shape calls verified or graded partial. The Sonnet 4 coding dominance call (2025-W37) verified through August procurement data. The MCP scale calls (continuation of 2024-W46 and 2025-W28) verified through public registry crossing 12,000 servers.
The sovereign-Gulf line of work accelerated. PIF's Anthropic anchor verified. The Trump-Tour deal sizes verified. Saudi Humain's operational status verified. The G42 Phase-Two Microsoft expansion verified inside Q3 (call we made in 2025-W12).
The contrarian deployment calls held. The Cursor default-flip call (2025-W19) verified through the August settings change. The DIFC growth call (continuation of 2024-W52) verified through the October registry numbers. The Apple-Intelligence-underperforms call (2024-W52) verified through visible Siri Pro delays.
The new wrong calls of Q3
2025-W33 GCC banks lead in AI underwriting. Wrong. The capability exists and several pilots have run but the deployment threshold we predicted has not crossed. We over-indexed on the technology-readiness signal and under-indexed on banking-sector change-management friction. The category is real and will land on a 2026 horizon, not the 2025 timeline we called.
2025-W36 Dubai AI Strategy 2031 milestones. Partial-pointing-to-wrong. Smart Dubai has materially advanced its roadmap in Q3 but the milestone-achievement rate we predicted has not materialized. We crossed institutional intentions and called it verified too early. The shape will likely materialize in Q4 2025 or early 2026.
Methodology reinforcement
We introduced the deployment-versus-capability distinction in our mid-year audit. Q3 results validate the value of the distinction. All four wrong calls in the program history (through 2025-W40) are deployment calls we framed incorrectly as capability calls.
The 2025-W33 banking call conflated algorithmic-readiness with loan-book deployment. The 2025-W36 Dubai Strategy call conflated roadmap publication with milestone achievement. The 2025-W42 and 2024-W42 wrong calls from H2 had the same framing error.
We are maintaining the Capability/Deployment/Combined tagging system for 2026 forecasting. Combined calls require both the technology existing and a named-buyer deployment landing. Deployment calls require the buyer side at scale even if the technology has existed for a year.
The full scorecard
We maintain a live /track-record page on this site. The page reads the same manifest the editorial team uses internally and renders every call with its original publication date, verification target, status, and the verifying post. You can audit our work in real time.
The aggregate numbers for the program to date:
Total verifiable calls: 48. Verified: 41 (85%). Partial: 4 (8%). Wrong: 3 (7%).
Distribution against target. We committed to 70% verified, 20% partial, 10% wrong. We are running too hot on verified, which we read as evidence of an overly conservative call selection rather than a perfect forecasting record. We will publish more aggressive calls in Q4 to bring the distribution closer to target. A program that does not get wrong calls is not pushing hard enough on the unknown.
What this means for the Gulf
Three reads close Q3.
The Gulf has definitively won the 2024-2025 sovereign-AI capital cycle. The capital is here, the labs are anchoring here, the inference infrastructure is being built here. The window we called in 2024-W01 is not just open, it is accelerating.
The frontier-model question through 2026 is increasingly about deployment-grade behavior at scale. The firms that win the next twelve months are the ones that can take an Opus-class model and turn it into a banking, healthcare, or government workflow that runs in production for years. Zanii has been arguing this from W01 of 2024. The Q3 record validates the posture.
The wrong calls matter. The two we missed in Q3 are both deployment calls that we framed as capability calls. The lesson is that deployment timelines in regulated GCC sectors run at their own pace, not at the pace of the technology. We are recalibrating for Q4 with tighter framing.
The Q4 forecast lands in 2025-W49. We will publish the next set of calls under the Capability/Deployment framework. The live /track-record page remains the canonical source of truth. You can grade us. You should.