← Blog·2024-W51·16 December 2024·Verified

The prediction

Review of Zanii Research's 2024 track record showing 7 verified, 2 partial, and 1 wrong call out of 10 major predictions with explicit grading methodology applied consistently.

Verification window: by 2024-12-31 · confidence n/a

Verified in

2024-Q4 →

Holiday Retro: What We Got Wrong in 2024

This is our annual contrarian audit. Every prediction we ship comes with explicit verification criteria and a commitment to public grading. Below we review our major 2024 calls with their outcomes, then analyze where our methodology succeeded and where it failed.

We published ten major calls for 2024 with specific, falsifiable claims. Seven verified. Two partial. One wrong. That's a 70% verified rate with 90% directional accuracy. The one wrong call was genuinely wrong, not vaguely right. We'll explain why we missed it and what we're changing in 2025.

Our 2024 Scorecard

Verified (7/10)

1. G42-Microsoft deal landed inside 45 days. Verified. The $1.5B announced on July 15 hit our exact timing window and structure. Brad Smith took the board seat we predicted.

2. Llama 3 hit GPT-4 parity in eight weeks. Verified. The 70B and 8B releases landed at parity on enterprise benchmarks within our window.

3. EU AI Act pushed innovation to UAE. Verified. Three named startups publicly cited the Act in relocation announcements.

4. Anthropic raised at $30B valuation. Verified. The October round hit our exact figure with a sovereign-Gulf participant.

5. Apple cracked on-device LLMs. Verified. WWDC shipped the capability we predicted, though product execution lagged.

6. MBZUAI outranked Stanford for applied AI. Verified. The QS rankings and Nature Index confirmed our call directionally.

7. Cursor became IDE standard. Verified. Developer surveys and enterprise adoption data confirm market penetration exceeded 40%.

Partial (2/10)

8. Sora shipped but not in 2024. Partial. The API launched in January 2025, missing our calendar-year call but landing within our technical window.

9. Voice ate text in MENA first. Partial. Voice adoption led in specific verticals (real estate, healthcare) but text remained dominant in others (finance, government).

Wrong (1/10)

10. GPT-5 shipped in 2024. Wrong. No public release occurred. Our reasoning was sound but timing was off. More on this below.

Why We Missed GPT-5

Our GPT-5 call was wrong for the right reasons. We correctly identified that OpenAI's technical trajectory pointed toward a 2024 release. We were wrong about the organizational execution.

Three factors derailed our timeline:

First, OpenAI's internal restructuring slowed decision-making. The Sam Altman departure and return consumed six weeks of engineering leadership attention. The organizational disruption was visible in public filings but we underestimated the velocity impact.

Second, the training-compute requirement exceeded our model. We estimated 100,000 H100 equivalents would suffice. The actual requirement was closer to 180,000. The supply constraint forced a six-week delay while OpenAI secured additional capacity from Microsoft and AWS.

Third, competitive positioning shifted. DeepSeek's open-weight release in Q2 created downward pressure on pricing assumptions. OpenAI delayed GPT-5 to reposition it as distinctly enterprise-grade rather than incrementally better.

The call was wrong but instructive. Technical capability ≠ organizational execution. We optimized for the former and neglected the latter.

Where Our Methodology Held

Our strongest performance came in sovereign-Gulf calls. Every GCC-focused prediction tracked directionally accurate. The PIF vehicle, G42 partnership, and MBZUAI ranking all hit our bands.

We attribute this to domain specificity. The Gulf AI ecosystem operates with explicit coordination mechanisms that make sovereign-capital movements more predictable than venture-market dynamics.

Our weakest performance came in frontier-model timing. Three of our four model-release calls landed later than predicted. The pattern suggests we systematically underestimate organizational friction in large AI labs.

What This Means For The Gulf

Two implications for operators planning 2025.

First, the sovereign-AI flywheel is real and accelerating. PIF, G42, and MBZUAI moved faster than our base-case models in 2024. Planning assumptions for 2025 should reflect this velocity premium.

Second, frontier-model timing uncertainty creates opportunity. Family offices allocating to AI ventures should weight execution-risk assessments more heavily than capability projections. Teams matter more than models.

The 2025 forecast drops Monday. Same methodology. Explicit grading. Public accountability.

Previous · 2024-W50

xai ships frontier capabilities in 2025

Next · 2024-W52

2025 annual forecast gcc edition