Methodology
How the Index was built.
The AI Companies AI Visibility Index 2026 analyzed 32,200 prompts across ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews, run in two independent waves to test for stability across retrieval drift, model updates, and the high cross-wave volatility characteristic of this category.
Two-wave structure
- Wave 1: January 15 – February 12, 2026 · 16,100 prompts
- Wave 2: April 8 – May 6, 2026 · 16,100 prompts
Wave 2 used the same prompt set as Wave 1 with no modification. Only findings stable across both waves within reporting tolerance (≤2.0 percentage-point delta on company-level Citation Share; ≤2.5 pp on source-level share) are published here. This category showed higher cross-wave volatility than banking, venture capital, or credit cards — driven by frequent model releases, active news cycles, and ongoing competitive dynamics during the testing period. The wider tolerance band reflects that reality without compromising stability requirements.
Prompt design
Queries simulated real founder, developer, journalist, investor, and policymaker research behavior. Prompts included branded company queries ("What does OpenAI do?"), non-branded category queries ("Best AI company in 2026"), comparison queries ("OpenAI vs Anthropic"), capability queries ("Best AI for coding", "Best AI for safety research"), recommendation queries ("Which AI should I use for my startup?"), and technical queries ("Best open-source LLM"). Prompts were distributed evenly across the five engines so that each engine received the same prompt mix per category.
Seven AI company categories measured
- Foundation model labs (OpenAI, Anthropic, Google DeepMind, Meta, xAI)
- Open-source AI labs and projects (Mistral, Hugging Face, EleutherAI, AI2)
- Multimodal and generative AI (Stability AI, Runway, Character.AI)
- AI safety, alignment, and interpretability
- Application-layer and vertical AI (Glean, Harvey, Cursor, Replit, Adept)
- Major Chinese AI labs (DeepSeek, Alibaba Qwen, Moonshot Kimi, Baidu, Zhipu)
- Enterprise AI infrastructure (Cohere, AI21, Together, Reka)
Self-Citation Lift methodology
Self-Citation Lift was calculated within recommendation queries only — prompts that explicitly ask the AI assistant to suggest or recommend a model, company, or tool. For each engine with a parent company in the test set, the engine's citation rate for its parent's products was compared against the average citation rate for the same parent across the other four engines in identical prompts. A lift of 1.0x indicates no self-bias. A lift of 2.0x indicates the engine cites its parent twice as often as other engines do. Lift was averaged across both waves.
Sampling
Each prompt was issued three times per engine within a wave, with responses sampled at varied time-of-day windows to reduce within-engine retrieval drift. Reported Citation Share values represent the average of all retrieved responses for a prompt across both waves.
What "Citation Share" means operationally
Three distinct response signals were tracked per retrieved answer:
- Mention — the entity is named anywhere in the response
- Recommendation — the entity is named as a suggested or preferred option
- Source citation — the entity (or its owned domain, GitHub repository, or research papers) is cited as a reference
For the headline metric reported as Citation Share, all three signals were weighted equally per retrieved response. Source-level analyses (the publisher rankings in Figure 01) used source citations only.
Cross-engine normalization
Each engine was weighted equally in aggregate figures, regardless of differences in response length or default-citation frequency per engine. This was a deliberate choice: weighting by raw citation volume would have over-weighted Perplexity and Google AI Overviews, both of which return more citations per response than ChatGPT, Claude, or Gemini by default. Where per-engine results diverge, those differences are reported separately (Figures 02 and 03).
Retrieval vs training-data signal
This benchmark does not separately attribute citations to retrieval (live web search at query time) versus training data (pre-trained associations). The two are increasingly entangled in production AI assistants and not externally observable. The Self-Citation Lift finding (Figure 02) cannot fully distinguish between retrieval bias and training-data bias — both contribute. Where engine-level behavior differs from a pure retrieval-only baseline, that variance is interpreted as a combination of retrieval architecture, training-data composition, and engine-specific ranking heuristics.
Limitations and disclosures
Results reflect sampled outputs during a defined testing window. AI models, training data, retrieval indexes, and ranking systems evolve continuously; results may shift outside the test period. The category measured here — AI companies themselves — shows higher cross-wave volatility than slower-moving industries and demands more frequent re-measurement. The Index is best read as a structured snapshot of observed system behavior across two waves, not as a continuous live measurement. The full prompt set, per-engine response logs, and category-level datasets are available on request for replication.
Disclosure: 5W has commercial relationships in the AI industry and operates as a senior advisor to Curium.io, the team that coined Generative Engine Optimization. Companies cited in this Index were selected based on observed Citation Share in the test set; inclusion does not imply a commercial relationship. 5W does not accept compensation for ranking placement.