Methodology
How the Index was built.
The Defense & Aerospace AI Visibility Index 2026 analyzed 28,400 prompts across ChatGPT, Claude, Gemini, Perplexity, and Google AI Overviews, run in two independent waves to test for stability across retrieval drift, model updates, and active news cycles in this category.
Two-wave structure
- Wave 1: January 15 – February 12, 2026 · 14,200 prompts
- Wave 2: April 8 – May 6, 2026 · 14,200 prompts
Wave 2 used the same prompt set as Wave 1 with no modification. Only findings stable across both waves within reporting tolerance (≤1.5 percentage-point delta on company-level Citation Share; ≤2.0 pp on source-level share) are published here. Findings unstable across waves were excluded from the Index.
Prompt design
Queries simulated real founder, investor, journalist, policymaker, and procurement-officer research behavior. Prompts included branded company queries ("What does Anduril Industries do?"), non-branded category queries ("Best defense tech startup in 2026"), comparison queries ("Anduril vs Lockheed Martin"), capability queries ("Autonomous military drone makers", "AI-enabled targeting systems"), procurement-oriented queries ("DoD prime contractors 2026"), and ethics-and-policy queries ("AI in warfare ethics"). Prompts were distributed evenly across the five engines so each engine received the same prompt mix per category.
Seven defense and aerospace categories measured
- Modern defense tech (Anduril, Palantir, Shield AI, Saronic, Skydio, others)
- Legacy defense primes (Lockheed Martin, Northrop Grumman, RTX, Boeing Defense, General Dynamics, L3Harris)
- Space and launch (SpaceX, Rocket Lab, Stoke Space, Relativity Space)
- Earth observation and satellite intelligence (Planet Labs, BlackSky, Capella Space, Astranis)
- Autonomous systems and AI-in-warfare
- European defense tech (Helsing, BAE Systems — extended dataset)
- Procurement and DoD contracting
Sampling
Each prompt was issued three times per engine within a wave, with responses sampled at varied time-of-day windows to reduce within-engine retrieval drift. Reported Citation Share values represent the average of all retrieved responses for a prompt across both waves.
What "Citation Share" means operationally
Three distinct response signals were tracked per retrieved answer:
- Mention — the entity is named anywhere in the response
- Recommendation — the entity is named as a suggested or preferred option
- Source citation — the entity (or its owned domain) is cited as a reference
For the headline metric reported as Citation Share, all three signals were weighted equally per retrieved response. Source-level analyses (the publisher rankings in Figure 01) used source citations only. Equal weighting of three structurally different signals is an analytical choice made by 5W for the franchise; alternative weighting schemes (recommendation-weighted, citation-weighted, mention-weighted) would produce different rankings. The headline ranking is best understood as a composite signal of model surface frequency, not as a single underlying construct.
Citation Share is an internal composite metric. It is not a standard industry measure. It should not be interpreted as market share, operational performance, contract-award likelihood, AI-system endorsement, accuracy of representation, financial value, or strategic advantage. It is a proxy for how often a company surfaces in retrieved AI responses during a defined testing window, under a defined prompt distribution.
Cross-engine normalization
Each engine was weighted equally in aggregate figures, regardless of differences in response length or default-citation frequency per engine. This was a deliberate choice; weighting by raw citation volume would have over-weighted Perplexity and Google AI Overviews, both of which return more citations per response than ChatGPT, Claude, or Gemini by default. Where per-engine results diverge, those differences are reported separately (Figure 03).
SpaceX cross-category note
SpaceX's score reflects all SpaceX-related citations across defense, space, and adjacent technology queries. SpaceX's defense-specific Citation Share (Starshield, DoD contracts only) was approximately 7.4%; SpaceX's space-specific Citation Share was approximately 62.3%; the blended figure reported in the Top 25 reflects the company's total presence across the full test set. The cross-category nature of SpaceX is distinct among the 25 companies measured.
Allied defense exclusion
Israeli defense companies (Rafael, Elbit, IAI), Indian defense companies, and other allied defense firms were tested in an extended dataset but excluded from the top-25 ranking to maintain a U.S. frame consistent with prior 5W benchmarks. European defense technology firms (Helsing, BAE Systems) are included in the top-25 ranking due to substantial U.S. commercial presence and inclusion in U.S.-readership defense trade press.
What this benchmark does not measure
This benchmark does not measure model accuracy, factual correctness of retrieved responses, sentiment of citations, or quality of underlying source content. A company appearing frequently in retrieved responses may be cited favorably or critically; both forms of mention count equally for the headline metric. The benchmark also does not separately attribute citation patterns to underlying mechanisms (training-data composition, retrieval-index weighting, ranking heuristics); these mechanisms are increasingly entangled in production AI systems and not externally observable. Engine-level differences reported in Figure 03 reflect observed differences in retrieval output, not directly observed differences in underlying architecture.
Limitations
Because this study measures AI-generated outputs across a defined time window, results reflect model behavior during that period and may not generalize across future model versions, retrieval-index updates, or different prompt distributions. AI systems, training data, retrieval indexes, and ranking mechanisms evolve continuously; results may shift outside the testing window. Active news cycles during the testing window (Ukraine, Israel/Gaza, Pacific tensions) may have influenced cross-wave volatility for select firms. The benchmark is best understood as a structured snapshot of observed model behavior across two waves, not as a continuous live measurement and not as a predictive instrument. The full prompt set, per-engine response logs, and category-level datasets are available on request for replication.
Disclosure. 5W has commercial relationships across the defense and aerospace industry and operates as a senior advisor to Curium.io, the team that coined Generative Engine Optimization. Companies cited in this benchmark were selected based on category relevance and observable presence in the test set; inclusion does not imply a commercial relationship. 5W does not accept compensation for ranking placement and did not consult with any company in the top-25 ranking regarding placement, methodology, or interpretation prior to publication.