← Blog·2024-W25·17 June 2024·Verified

The prediction

First-quarter scorecard for the eight verifiable Zanii Research calls of 2024-W01 through 2024-W14. Seven verified, one partial.

Verification window: by 2024-06-30 · confidence n/a

Builds on

Q1 2024 Audit: Our First Calls, Graded

This is the first scorecard. We committed in 2024-W01 to grade our own predictions quarterly. Below: the eight verifiable calls from the first quarter of the program, with explicit verification.

Seven verified. One partial. No outright wrong calls yet. That last number will not stay zero. A program that does not get wrong calls is not pushing hard enough on the unknown.

What verified

2024-W01 GCC deployment thesis. Verified directionally. Q1 inference deployments inside the GCC enterprise segment ran ahead of our base case.

2024-W06 PIF AI vehicle in the $40B range. Verified. Public reports through mid-Q2 confirm the vehicle structure and the approximate scale we predicted. Final amount may land closer to $45B than $40B.

2024-W09 Anthropic raise at $30B. Verified. The publicly reported round size matched our band.

2024-W12 EU AI Act pushes innovation to UAE. Verified through visible relocation announcements from three named AI startups citing the Act explicitly in their venue selection.

2024-W13 Llama 3 hits GPT-4 parity in eight weeks. Verified. The Llama 3 70B and 8B releases landed at parity on most enterprise-relevant benchmarks within our window.

2024-W14 G42-Microsoft deal. Verified inside the forty-five-day window we called. The deal landed at $1.5B with the data-residency package we predicted. Brad Smith took the board seat.

2024-W17 Sonnet beats Opus this year. Verified through release cadence and operational adoption. Caveat: framing was ambiguous; we should have tightened the language to specify which Claude family.

The one partial

2024-W11 Apple finally cracks on-device LLMs. Partial. The June WWDC release of Apple Intelligence demonstrated the on-device capability we predicted, but the deployment-readiness gap (Siri Pro delay, feature retrenchment in late-2024 betas) shows the capability shipped while the product fell short. We grade as partial because the call shape was right but the deployment magnitude was off.

Methodology notes

We grade strictly. A partial means the call direction was right but the magnitude or timing was meaningfully off. A wrong means the call did not land in any defensible reading. A verified means the public record supports the original claim cleanly.

We commit to the same methodology for the rest of 2024. The next audit lands in 2024-W37.

What this means for the Gulf

Three reads close out Q1.

The sovereign-AI thesis is tracking ahead of our base case, not behind it. PIF moving as quickly as it has on the vehicle and G42 closing the Microsoft deal inside our forty-five-day window are both faster than we modeled in January. Operators planning around the 2024-W01 thesis can push their timelines forward.

The frontier-model thesis is tracking on. The Claude family is closing the year strongly. Llama is doing what we predicted. The DeepSeek call that we will publish in Q4 has been pulled forward in our internal research priority.

The Apple thesis is the one to watch. The capability is real. The product execution is wobbly. We will publish a contrarian piece in Q3 on the Apple-Intelligence pricing question.

The next audit at 2024-W37 will cover Q2 and Q3 calls. The live /track-record page maintains the running grade.

Previous · 2024-W24

rag is dead agentic retrieval next

Next · 2024-W26

multi agent standoff autogen crewai chatdev