The labs are the press. OpenAI, Anthropic, DeepMind, Google AI Research, Meta FAIR, and Hugging Face publish more cited content than every paywalled prestige publication that covers them. Below the lab tier, two structural dynamics define the sector: open community substrates (Hacker News, r/MachineLearning, r/LocalLLaMA, Stack Overflow) carry the citation load that journalism carries in other sectors, and the highest-authority paywalled outlets — The Information, Stratechery, Bloomberg, The Wall Street Journal, the Financial Times — sit suppressed below their actual influence on the operator class.
The most-read AI journalism is not the most-cited AI journalism. The training-data economy and the paywall economy run in opposite directions, and the gap defines the sector. AI media grades B because the journalism layer is strong but the citation economy is dominated by sources outside the journalism layer entirely.
The labs publish their own canon. OpenAI's research pages, Anthropic's research and Claude model cards, DeepMind's blog, Google AI Research, Meta FAIR, and Hugging Face's documentation are routinely cited as primary sources for the products they ship. No other sector has this dynamic — Pfizer does not out-cite STAT News on Lipitor. In AI, the manufacturers are the press of record on their own products.
Newsletters provide the synthesis layer. Stratechery, Platformer, Big Technology, Pragmatic Engineer, Import AI, AI Snake Oil, One Useful Thing, Interconnects, and Latent Space carry the analysis the engines pull for "what is happening in AI" queries. Substack is more structurally important to AI retrieval than to any other sector.
Forums and community substrates carry the connective tissue. Hacker News, r/MachineLearning, r/LocalLLaMA, and Stack Overflow show up disproportionately for opinion, technical disagreement, and practitioner-experience queries. r/MachineLearning is cited above some mid-tier trade press despite being a forum.
The paywall economy suppresses the prestige tier. The Information, Stratechery, Bloomberg AI, FT Tech, WSJ Tech, and NYT Tech produce the highest-quality AI journalism — and the engines cannot cite what they cannot reach. Paywalls cost the prestige tier 10–25 composite points each.
Geography is U.S.-dominated. Severely. Citation density follows the U.S. AI press to a degree disproportionate even to the U.S. share of the AI industry. UK presence is moderate. Chinese AI press (Caixin, Synced, China Daily Tech) is almost entirely absent from English-language engine retrieval despite the scale of Chinese AI.
55 properties across established tech press, lab and institutional publishers, newsletter and
| Property | Score | Note |
|---|---|---|
| TechCrunch | 82 | Open access, high velocity, named-entity dense. Highest journalism-tier score. Strongest on product launches, explainers, accessibility. Open, well-structured. Authority plus open access on most content. Strong cross-engine consistency. NOTE |
| Property | Score | Note |
|---|---|---|
| Reuters (AI coverage) | 64 | Wire economics. Cited but rarely as the primary source. Heaviest intellectual citation when accessible. Paywall suppresses below influence. Partial paywall. Strong on business, funding, enterprise AI deployment. Open Substack era built the base; paywalled portion suppresses. Lab publisher. Strong on open-weights, computer vision, PyTorch-adjacent queries. NOTE |
| Property | Score | Note |
|---|---|---|
| The Atlantic (AI essays) | 54 | Long-form authority, partial paywall. AI-society queries. Foundational AI policy newsletter. Intermittent post-2024 caps the score. VC-and-AI angle. Strong on funding and founder queries. |
| Financial Times (AI) | 52 | High authority. Paywall is the score cap. Synthesis newsletter. Open access. Strong cross-engine. AI safety community. High citation density per post. NOTE |
| Property | Score | Note |
|---|---|---|
| Towards Data Science (Medium) | 42 | Legacy data-science publication. Declining authority. Long-tail trade publication. High publication volume, lower per-piece extractability. Consumer-AI digest. Recent entry. Geographic-language barrier to U.S.-trained engine retrieval. |
In every sector 5W has modeled, the press is the press and the brands are the brands. Pfizer does not write the cited reference on Lipitor — STAT News or the New England Journal of Medicine does. Tesla does not write the cited reference on its driver-assistance safety record — the IIHS or Reuters does. Procter & Gamble does not write the cited reference on its skincare formulations.
In AI, the manufacturers publish the primary source for the manufactured thing. OpenAI's GPT-4 system card is the cited reference for GPT-4. Anthropic's Claude model card is the cited reference for Claude. DeepMind's technical reports are cited above the journalism covering them. Hugging Face's documentation is the cited reference for nearly every open-source model on the platform. Google AI Research and Meta FAIR routinely sit in the top tier of citations for their own architectures.
This is the Lab-as-Publisher Effect. The labs are not subjects of coverage. They are publishers. They control the retrieval graph for their own products. The journalism layer sits on top of the lab layer, not in place of it. Three secondary patterns reinforce: the Training-Data Paradox (paywalled prestige journalism suppressed below its influence), the (Hacker News and r/MachineLearning carrying load conventional journalism Community Substrate carries elsewhere), and the (definitional queries route to Wikipedia consistently primary). Wikipedia Authority Layer
220 pages. 38 sectors. The first reference work for the AI retrieval economy.
Download PDF →