[ECHO]4 min read

The Three-Tier Template

Apple unveiled a rebuilt Siri at WWDC on June 8, running a 1.2-trillion-parameter Google Gemini model through a three-tier routing system: simple queries stay on-device, mid-complexity requests route to Apple's Private Cloud Compute, and heavy reasoning lands on Google Cloud's Nvidia B200 GPUs. The detail that matters is that this is the first time a major consumer-software vendor has published a three-tier inference architecture as the design for an AI product shipping to two billion devices, and the third tier explicitly routes to a competitor's cloud, on hardware Apple doesn't make, behind a model Apple doesn't train. The consumer-product company with the longest-running commitment to vertical integration just published the design pattern that ends vertical integration at the model layer. The architecture is now the default starting point for any AI product team's inference design, and the burden of proof shifts to whoever argues against the multi-tier read. The inference question for the rest of 2026 isn't which vendor to standardize on; it's where the tier boundaries sit in your specific product, and which vendor sits at each tier.

The Three-Tier Template

Apple unveiled a rebuilt Siri at WWDC on June 8. The 1.2-trillion-parameter Google Gemini model behind it runs through a three-tier routing system: simple queries stay on-device, mid-complexity requests route to Apple's Private Cloud Compute, and heavy reasoning tasks land on Google Cloud's Nvidia B200 GPUs. The routing template is now public.

The detail that matters is that this is the first time a major consumer-software vendor has published a three-tier inference architecture as the design for an AI product shipping to two billion devices. Apple Intelligence's prior inference layout had two tiers and ran exclusively on Apple silicon. The June 8 reveal adds a third tier that explicitly routes to a competitor's cloud, on hardware Apple doesn't make, behind a model Apple doesn't train. Every product team running an AI feature now has a public reference design from the vendor with the largest deployed device fleet in the world.

What Apple published isn't a Siri rebuild. It's a public commitment that the next decade of consumer-AI shipping happens through vendor-mix at the inference layer, not through proprietary stacks. Apple has the longest-running commitment to vertical integration of any consumer-product company. The June 8 architecture says vertical integration ended at the model layer.

The three-tier inference routing Apple published for Siri at WWDC on June 8.

Imagine a 40-engineer product team shipping an AI feature in a consumer app. The team's procurement decision today would be either to standardize on one model vendor for cleaner contracts and a single inference bill, or to route across multiple vendors for cheaper heavy queries and more operational complexity. Last year that choice would have been contested because no large-fleet reference architecture had been published, and the team's CTO could only point to vendor decks. After June 8, the CTO points to Apple's three-tier reveal, and the conversation moves from "should we vendor-mix" to "where do we put the tier boundaries."

The asymmetry is the point. Vendors selling inference at scale have been telling buyers that vendor-mix is operationally fine for two years, and buyers had to take their word for it. Apple just showed the work, with the largest possible blast radius, in a product the company can't afford to embarrass itself on. The architecture is now the default starting point for any AI product team's inference design, and the burden of proof shifts to whoever argues against the multi-tier read.

The steelman is that Apple's three-tier design is bespoke to a problem Apple specifically faces: two billion devices with constrained on-device compute, a hosted Apple cloud, and a Google deal for what Apple silicon won't run. A product team without those preconditions has different math at the integration level, but at the design level the argument doesn't hold. A team routing across AWS Bedrock with mixed open-weights and closed frontier models is solving the same routing problem at a different scale, and the design transfers cleanly. The vendor-mix is the part Apple just normalized, not the specific hardware.

The thing worth seeing is that the consumer-product company most committed to vertical integration in the world just published the design pattern that ends vertical integration at the model layer. The architecture is now the public reference every product team will copy. The inference question for the rest of 2026 isn't which AI vendor to standardize on; it's where the tier boundaries sit in your specific product, and which vendor sits at each tier.

What to Do With This

Pull up your current AI product architecture this week. If you're running a single-vendor stack, map what query types would route differently if you opened up to a second vendor at the heavy-reasoning tier. If you're already vendor-mixing, check that your tier boundaries match what Apple published and note where they diverge.

Either way, your next architecture review now has a public reference point that didn't exist on June 7.

Also on the Radar

Anthropic Discloses Claude Wrote 80% of Its Production Code

Anthropic Institute published When AI builds itself on June 4, disclosing that over 80% of code merged into Anthropic's production codebase in May was authored by Claude and that engineers are merging eight times the daily code volume of a 2024 baseline. The same paper calls for a verifiable, multi-country mechanism to slow frontier development if recursive self-improvement crosses a threshold the paper leaves unspecified. For buyers standardizing on Claude Code, the lab itself just supplied both the strongest case for deepening the deployment and the strongest case for treating it as a risk.

Apollo and Blackstone Close $35B Chip Financing for Anthropic

Apollo Global Management and Blackstone finalized a $35 billion debt deal on June 5 to buy Google TPUs for Anthropic to lease, structured through a special-purpose vehicle that keeps the hardware off Anthropic's balance sheet. Broadcom is backstopping the senior tranches. For the next round of frontier-lab capex, the structure is the private-credit template other labs are likely to copy once equity-funded buildouts hit their ceiling.

Want More Than This Newsletter?

Alcreon publishes a daily AI briefing, long-form dossiers, and an analysis feed for the teams actually shipping AI in production. This newsletter is one read out of the full library.

Read the daily feed or browse the editorials.

Share