5W AI Communications · Research
The 5W Retrieval Index — Volume I

Methodology

How properties are scored, tiered, and compared across the AI retrieval economy.

The Composite Score

Five components. Normalized to 0–100.

Every property in the Index is scored on a fixed five-component composite, normalized to 0–100. The components and their weights:

Citation Frequency (40%) How often a property appears as a primary source across cross-engine retrieval testing.

Cross-Engine Breadth (20%) Whether the property is cited reliably across all five major AI engines (ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews) or only one or two.

Query-Type Breadth (20%) Whether the property is cited across the full range of buyer query classes for its sector (research, news, opinion, technical, comparison) or only one.

Extractability (15%) How well the property’s content can be parsed, attributed, and summarized — clean HTML, structured metadata, stable URLs, named entities.

Crawl Access (5%) Whether the property is reachable by the engines. Paywalls and registration walls subtract from this score.

The Four Tiers

How properties are grouped.

Retrieval Anchor (72+) The primary citation tier for a sector. Sources the engines reliably return to. Operators competing in the sector must understand these sources because they shape every answer.

Cited (56–71) Regularly cited in their sector, but not always as the primary source. Important to track and to be present in, but not retrieval-defining.

Moderate (44–55) Surface occasionally in retrieval. Visible to the engines but not anchored. Often where emerging publications and specialist trade press sit before promotion.

Low-Yield (below 44) Rarely in regular engine rotation. Either too narrow, too paywalled, too new, or too obscure for the engines to weight reliably.

The Five AI Engines

Scope of coverage.

Retrieval behavior is modeled across the five major AI systems that answer buyer-class queries at scale: ChatGPT (OpenAI), Claude (Anthropic), Perplexity, Gemini (Google), and Google AI Overviews. The Index does not score regional engines (Baidu, Yandex), specialist verticals (Pi, Character.AI), or enterprise-only systems.

On the Estimates

What the scores represent.

Scores are directional estimates derived from structured cross-engine retrieval analysis, public citation observation, source accessibility assessment, and comparative retrieval modeling across the major AI systems — ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews. Scores are normalized within sectors and benchmarked against observed citation frequencies. This publication models retrieval behavior directionally rather than as a precision audit.

The Entity Layer

How engines actually retrieve.

Retrieval in AI engines is not list-ranking. It is entity-resolution. The engines maintain internal representations of brands, publications, products, people, and concepts as entities connected through co-citation, semantic reinforcement, and knowledge-graph relationships built during training and updated through retrieval. The composite scores in this Index are best read as proxies for entity authority within a sector: how reliably the engines resolve and surface a given source on the queries that matter.

Four entity-layer behaviors shape the scoring. Co-citation density — the engines treat sources as more authoritative when they are cited together with other sources already established as authoritative, producing reinforcement loops the Index registers as durable rankings. Semantic reinforcement — sources whose entity descriptions match the engine’s internal taxonomy retrieve more reliably than sources that do not. Named-entity extraction — sources with clean entity markup, consistent attribution, and stable proper-noun usage compound visibility because the engines can parse and resolve them. Knowledge-graph persistence — sources cited reliably over time accumulate authority through compounding retrieval, producing the durable rankings the Index captures.

Limitations of the Model

Six limitations to read alongside every score.

1. Directional, not deterministic. Scores estimate where sources sit in the retrieval economy, not the exact rate at which any single engine returns them.

2. Engine behavior changes constantly. Model updates, training refreshes, and retrieval-system revisions shift citation behavior on weekly and monthly cycles. The Index captures a structural snapshot.

3. Query classes vary. A property may anchor one query class in a sector and barely surface in another. The composite score averages across the query classes most relevant to a sector’s buyers.

4. Retrieval differs by geography. The Index reflects English-language retrieval anchored in U.S.-trained engine behavior. Regional engines and non-English retrieval architectures behave differently.

5. Scores reflect observable patterns, not internal engine data. The Index does not access proprietary engine telemetry. It models patterns from external observation, public citation behavior, and structured retrieval analysis.

6. Rankings are comparative models, not exact measurements. A property scored 76 is meaningfully different from one scored 56. A property scored 76 is not meaningfully different from one scored 78. Read the tiers, not the decimal places.

Get Volume I.

220 pages. 38 sectors. The first reference work for the AI retrieval economy.

Download PDF →