Annual scorecard for 2024 plus 2025. 156 verifiable predictions on the record. 123 verified (79%), 22 partial (14%), 11 wrong (7%). Methodology unchanged. Two-year track record locked.
Verification window: by 2025-12-31 · confidence n/a
- 2024-W52
- 2025-W49
Year-End Track Record: 79% Verified on 156 Predictions
This is the second full annual scorecard for the Zanii Research program. Twenty-four months of public predictions. 156 verifiable calls. 123 verified, 22 partial, 11 wrong. A 79% strict verified rate. 93% directional.
The audit cycle was supposed to surface our errors more clearly than the wins. In some categories it did (robotics, tokenization). In others it confirmed the original thesis at scale.
The two-year position holds. Below: how it locked.
The 2025 ten-call methodology, graded
We committed in 2024-W52 to grade the ten 2025 calls against the verification criteria we named. Final grades:
1. Anthropic raises at $60B. Verified. Q1 round closed at $61.5B.
2. Chinese frontier-class reasoning triggers Nvidia re-pricing in first six weeks. Verified. DeepSeek R1 plus the Monday session.
3. Trump AI Action Plan inside 120 days with three components. Verified.
4. Saudi sovereign-AI commercial entity in Q1. Verified. Humain launched.
5. MBZUAI R1-class reasoning model. Partial. Two strong papers, release slipped from Q4 into early 2026.
6. MCP crosses 10,000 servers. Verified comfortably. The Anthropic directory crossed 12,400 in late October.
7. Voice agents become default support at three top-twenty GCC banks. Verified. Emirates NBD, ADCB, and Al Rajhi all in production.
8. DIFC AI license framework crosses 50 entities. Verified.
9. Cursor default flips to Anthropic and stays. Verified. Locked in February, no sustained reversal through year-end.
10. Apple Intelligence underperforms keynote claims. Verified. Siri Pro delay, Q2 feature retrenchment, quiet pricing revision in H2.
Nine verified, one partial. Distribution: 90% strict, 100% directional.
The aggregate two-year position
Total verifiable calls: 156. Verified: 123 (79%). Partial: 22 (14%). Wrong: 11 (7%). Pending: 0 in the 2024 plus 2025 program (forward-dated 2026 calls will start carrying pending status into next year).
Distribution against target. We committed to 70/20/10. Running 79/14/7 across two years. We will publish more aggressive 2026 forecasts to bring the distribution closer to target.
What two years of audit tells us
Three meta-observations.
The frontier-model line of work is the hardest to call but the cleanest to verify. Our model-release-shape calls have a near-perfect record across two years. The releases are public, the benchmarks are public, the verification is binary.
The sovereign-Gulf line of work is the hardest to call cleanly and the cleanest to verify in terms of public impact. Calls about PIF, G42, Humain, MBZUAI, and the DIFC framework have landed at very high rates. The underlying dynamic is that GCC actors are operating with larger margins of strategic intent than the rest of the AI world gives them credit for.
The deployment line of work is the trickiest. Our deployment calls (robotics, OpenAI Operator, ADGM AI-securities) have been the main source of wrong calls. We introduced the capability-versus-deployment distinction in 2025-W49 and the 2026 program will use this distinction explicitly.
What this means for the Gulf
The two-year position confirms the 2024-W01 thesis. The Gulf has won the sovereign-AI capital cycle. The deployment cycle is open and running ahead of where most non-Gulf analysts model it. The indigenous vendor class is emerging on schedule.
The 2026 forecast lands in 2026-W01. The program continues weekly. The live /track-record page maintains the running grade. You can check our work. We have published our wrong calls with the same prominence as our right ones. That is the asset we have built and intend to keep building.