III · First principles
Language before reason.
The whole paradigm is upside down.
The trillion-dollar buildout rests on a single unexamined assumption: that larger models, trained on more tokens, running on more chips, converge asymptotically on intelligence. That assumption inverts the order in which cognition actually works. Reason comes first. Language is the wrapping you put on reason so it can travel between people. Large language models do the opposite. They put language first and try to bootstrap reason out of statistical regularities in tokens. This is not a tuning problem to be fixed by the next generation. It is a foundational architectural error, and every additional dollar of capital expenditure compounds it.6
The parameter trap.
Brute-force language processing, simulated by ever-larger stacks of ever-hotter chips, produces an increasingly sophisticated mirror of human linguistic output. It does not produce understanding. Understanding is the capacity to reason in the absence of language, and reason has to exist first for language to be layered on top of it coherently. When the architecture is built the other way around, hallucination is not a bug. Hallucination is the feature doing exactly what it was designed to do, generating plausible next-token sequences at the ragged edges of the training distribution. The system lives permanently in the middle. It is able to mimic reason because the humans who wrote the training corpus were reasoning. It is not itself reasoning.
The working test case is nearly a century and a half old. In 1880 the Smithsonian documented a pre-linguistic deaf-mute reasoning about mortality, causation and cosmology before acquiring a single word. Reason demonstrated in the absence of language. That capacity is precisely what the current paradigm cannot build and will not buy with three trillion dollars of graphics processors.
An entity does not possess the capacity for understanding until reason is demonstrated in the absence of language.6
What this does to the valuation.
Every model release is a bet that the next trillion parameters delivers the step change in capability the last ten did not. Scaling-laws curves are visibly flattening. The data wall is real. The energy ceiling is real. The cost per useful inference is moving in the wrong direction as capability gains slow. Competing reason-first approaches, System Two compression, neural-symbolic hybrids, AlphaGeometry-style integrations of search with language, do not require the three-trillion-dollar buildout to succeed. They need a reason to work. If any one of them lands inside the current capital-expenditure cycle, the scaling myth underwriting Nvidia, the hyperscalers and OpenAI is not compressed by fifty per cent. It is redundant.