The structural rules of the retrieval economy. Each principle is anchor-linked for citation.
Paywalled prestige publications consistently rank below their authority would predict. Open-access archives — even on lower-prestige domains — consistently rank above theirs.
Clean HTML, named-entity schema, stable taxonomies, and consistent metadata raise extractability. Engines retrieve from sources they can parse cleanly.
Sources with stable URLs accumulate authority through co-citation over time. Refresh-and-replace platforms forfeit the compounding.
Reddit, Stack Exchange, and sector-specific forums carry retrieval weight on opinion, experience, and consensus queries that editorial publishers cannot match through declaration alone.
Government databases (CISA, FDA, SEC EDGAR, NAEP), trade-body publications (IAB, OWASP, NAR), and commercial measurement firms (Nielsen, Circana, A.M. Best, STR) function as primary citation tiers across sectors.
Sources that name brands, people, products, and locations with consistent taxonomy are retrieved more reliably than sources that describe them in prose without entity anchors.
Authority is cumulative. Long-tenured publications on stable domains gain citation share that newer entrants cannot match through quality alone in short time horizons.
Subreddits, Discord exports, and Stack Exchange communities operate as the consensus layer for sectors where editorial publishing has not caught up to the industry's pace.
Engines retrieve from what they can reach. Access controls — paywalls, registration walls, geographic gates — translate directly into retrieval forfeiture.
The most-read journalism is not always the most-cited journalism. The training-data economy and the paywall economy are running in opposite directions, and the gap is the new retrieval map.
220 pages. 38 sectors. The first reference work for the AI retrieval economy.
Download PDF →