Skip to main content
Eevy.ai
strategy

AI Search Visibility Audit for Ecommerce: A Step-by-Step Playbook (2026)

By Marius Møller-Hansen2026-07-0310 min read

Free — 30 seconds

Is your product page losing sales right now?

Most Shopify PDPs we scan have 4+ fixable conversion gaps. Paste your URL and get a scored audit instantly.

Get my free audit →

An AI search visibility audit is a structured check of what ChatGPT, Perplexity, Gemini, Google AI Overviews, and Copilot currently say about your brand, your products, and your category, run against a fixed question set so the results are comparable over time. You do not need paid tooling to run one. You need an afternoon, a spreadsheet, and the discipline to ask the same questions the same way every month. This playbook gives you the full procedure: build the question set, run it across the engines, score the answers, trace where each answer comes from, verify the technical plumbing, and turn the gaps into a prioritized fix list.

This is a different job than tracking AI referral traffic in GA4. Traffic tracking tells you how many visits AI sends; the audit tells you why that number is what it is: whether you are being named at all, what the engines believe about you, and which sources they are reading. If you have not set up the measurement side yet, do that too (see the tracking guide linked at the end), but the audit is where you find the things you can actually fix.

Why you audit instead of guessing

Most merchants have a vague sense of their AI visibility built from one or two anecdotal prompts: someone on the team asked ChatGPT "best [category]" once, the brand did not appear, and the conclusion was "AI ignores us." That is not an audit, it is a coin flip. Answers vary by phrasing, by session, by engine, and by day. A single prompt tells you almost nothing; a fixed battery of 20 to 40 prompts, repeated monthly, tells you a lot.

The audit produces three assets:

  1. A visibility baseline. Which engines mention you, for which questions, at what position, with what accuracy.
  2. A source map. The specific roundup articles, Reddit threads, review profiles, and publisher pages that feed each answer. These are your actual battleground, because you influence AI answers mostly by influencing what the engines read.
  3. A prioritized fix list. Technical gaps (blocked crawlers, missing schema), content gaps (no presence in the roundups that get cited), and accuracy gaps (wrong prices, dead products, stale policies being quoted).

Step 1: Build a fixed question set (20 to 40 prompts)

The whole method rests on asking the same questions every time. Write the set once, save it, and resist the urge to reword prompts between runs, because rewording destroys comparability.

Cover three intent stages, roughly in these proportions:

  • Awareness and category questions (about 40%). These are the "best X" and "how do I choose" questions where shoppers first meet brands. Examples: "best organic dog treats," "what should I look for in a standing desk," "best skincare for sensitive skin under $50." Include the price-qualified and audience-qualified variants your real customers use, because engines answer "best running shoes" and "best running shoes for flat feet under $120" from different sources.
  • Comparison questions (about 30%). "Brand A vs Brand B," "is [your brand] better than [leading competitor]," "alternatives to [category leader]." These reveal whether engines position you as a contender or omit you from the consideration set entirely.
  • Purchase and brand questions (about 30%). Direct questions about you: "is [your brand] legit," "what do reviews say about [your product]," "[your brand] return policy," "[your product] vs [competitor product]." These test accuracy rather than visibility. An engine that recommends you but quotes a discontinued price or a return policy you changed two years ago is a different problem than an engine that has never heard of you.

Write prompts the way shoppers actually type, not the way marketers phrase keywords. "whats a good non toxic yoga mat that doesnt smell" is a more realistic probe than "best non-toxic yoga mat 2026." Include a few of each register.

Store the set in a spreadsheet with one row per prompt and a column for intent stage. That spreadsheet becomes the left side of your scorecard.

Step 2: Run the set through each engine in fresh sessions

Run every prompt through, at minimum: ChatGPT (with web search active), Perplexity, Gemini, Google AI Overviews or AI Mode (search the prompt on Google and record what the AI surface shows, noting that Overviews do not trigger on every query), and Microsoft Copilot. Five engines, 20 to 40 prompts each; this is the bulk of the afternoon.

Hygiene rules that keep the data honest:

  • Fresh sessions. Start a new chat for every prompt, or at least for every intent group. Follow-up questions inside one conversation inherit context and contaminate the answer.
  • Logged-out or memory-off where possible. Engines with memory personalize toward what they know about you, and you are not your customer. Use a browser profile with no history for the Google surfaces, and turn off memory or use temporary chats in ChatGPT.
  • Record, do not summarize. Paste the full answer (or a screenshot link) into your log. Two months from now you will want the exact wording, not your compressed memory of it.
  • Capture citations. Perplexity, Copilot, AI Overviews, and search-enabled ChatGPT show sources. Copy every cited URL for the prompts where you appear and for the prompts where competitors appear instead. This feeds Step 4.
  • Expect variance. The same prompt can produce different answers an hour apart. You are sampling a distribution, not reading a ranking table. This is exactly why the question set is large and the cadence is monthly: trends across 30 prompts are meaningful where any single answer is not.

Step 3: Log a simple scorecard

For every prompt and engine pair, record five things: were you mentioned, were you recommended (named as a pick, not just listed), were you cited (your own domain appears as a source), which sources fed the answer, and what the answer got wrong. A minimal scorecard looks like this:

| Prompt | Engine | Mentioned? | Recommended? | Own site cited? | Sources feeding answer | Errors / notes | |---|---|---|---|---|---|---| | best organic dog treats | ChatGPT | Yes | No (listed 6th of 7) | No | 2 roundup blogs, Reddit r/dogs | Quotes old 4.2 rating, current is 4.7 | | best organic dog treats | Perplexity | Yes | Yes (top 3) | Yes | Own PDP, roundup, Reddit | None | | [brand] vs [competitor] | Gemini | Yes | Mixed | No | Comparison blog from 2024 | Says we lack subscription option (we added it) | | is [brand] legit | Copilot | Yes | n/a | Yes | Trustpilot, own site | Return window quoted as 14 days, actually 30 | | best [category] under $50 | AI Overviews | No | No | No | Two competitor roundups | Absent entirely |

Then compute four summary numbers per engine: mention rate (share of prompts where you appear), recommendation rate (share where you are an actual pick), citation rate (share where your own domain is a source), and error count. These four numbers per engine are the audit's headline output, and they are what you will trend month over month.

Score competitors on the same pass. Knowing that a rival hits 70% mention rate on category prompts while you hit 20% turns a vague anxiety into a specific, sized gap.

Step 4: Trace the sources

This is the step that converts observation into strategy. For every cited answer, list the source URLs and tally them across the whole audit. A pattern emerges fast: most category answers in a niche are fed by a surprisingly small set of pages, typically a handful of "best X" roundup articles, one or two Reddit threads, review platform profiles (Trustpilot, Google, sometimes app-store style listings), and a category publisher or two.

Classify each recurring source into one of three buckets:

  • You can appear there. A roundup that takes product submissions, a Reddit thread where honest participation is possible, a review platform where you have no profile or a thin one. These become outreach and presence tasks.
  • You can improve what it says. Your own review profiles with low volume or stale ratings, an old comparison post citing your 2024 feature set. These become update and review-generation tasks.
  • You cannot influence it. A competitor's own content, a closed editorial list. Note it and move on.

Also note which of your own pages get cited when you do appear. If engines cite your product pages, those pages' content is your voice in the answer; if they only ever cite third parties about you, the engines either cannot read your site well or do not trust it as a source, which points straight at Step 5.

Step 5: Technical checks

Twenty minutes of plumbing verification, because none of the content work matters if the engines cannot read you.

AI crawler access. Fetch yourstore.com/robots.txt and confirm you are not blocking: GPTBot and OAI-SearchBot (OpenAI), PerplexityBot, ClaudeBot, Google-Extended (governs Gemini's use of your content; regular Googlebot covers AI Overviews), and Bingbot (feeds Copilot). Blocks hide here more often than you would expect: a CDN or bot-protection layer can serve AI crawlers a different robots.txt or a 403 than what you see in your repo, so verify from outside your own network, and if you are behind Cloudflare or similar, check the bot-management dashboard, not just the file. A useful spot check is to request a product page with an AI crawler user-agent via curl and confirm you get a 200 with real HTML.

Server-rendered structured data. View source (not the rendered DOM) on a top product page and confirm your Product schema with offers, aggregateRating, and review markup is present in the raw HTML. Schema injected client-side by JavaScript is invisible to most AI crawlers, which largely do not execute scripts. The same goes for the review content itself: if your reviews only exist inside a client-rendered widget, the engines are reading a page with no social proof on it.

Feed status. Confirm your Google Merchant Center feed is live, error-free, and current, since Google's shopping surfaces and AI answers lean on it, and check that prices and availability in the feed match the site. If OpenAI's product feed program or other engine-specific merchant feeds are available to you, being in them is worth the setup time; this area is moving quickly, so check current eligibility rather than assuming.

Step 6: Gap analysis and the prioritized fix list

Now merge everything into one ranked list. A practical priority order:

  1. Unblock and fix technical issues first. A blocked crawler or client-only schema caps everything else at zero. These fixes are usually hours, not weeks.
  2. Correct factual errors the engines repeat. Wrong prices, dead SKUs, outdated policies. Fix the source the engine cited (your own page, your review profile, an old post) so the next crawl picks up the truth.
  3. Close the source gaps with the best effort-to-impact ratio. Claim and grow thin review profiles, pursue inclusion in the two or three roundups that feed most category answers, participate honestly where community threads dominate.
  4. Build content for the prompts where nobody answers well. If the engines give generic, sourceless answers to a question your customers ask, a genuinely useful page on that question has a real shot at becoming the cited source.
  5. Strengthen your own cited pages. For pages the engines already read, make the extractable facts (ratings, review substance, specs, policies) accurate, current, and server-rendered.

Assign each item an owner and a month. Ten prioritized items beat forty aspirational ones.

Step 7: Re-run monthly and track the trend

Re-run the identical question set monthly, same engines, same fresh-session hygiene. Log the four summary rates per engine into a running sheet and annotate what you shipped between runs. Because individual answers are noisy, judge movement on the monthly trend of mention rate and recommendation rate, not on whether one specific prompt flipped. Two to three months is a realistic lag between fixing a source and seeing the answers shift, since engines recrawl and recompute on their own schedules.

A note on tooling: a category of paid AI-visibility tracking tools now exists that automates prompt monitoring across engines at larger scale. They are worth evaluating once the manual audit has proven the channel matters for you, but the manual run stays valuable even then, because reading full answers yourself catches positioning nuances and factual errors that a mention-counting dashboard summarizes away. The manual audit is the ground truth; tools are a scaling layer on top of it.

Close the loop: visibility gets the visit, the page converts it

The audit governs whether AI engines name you and send you shoppers. It does not govern what happens when that shopper lands. AI-referred visitors arrive with high intent and specific expectations set by the answer they just read, and the product page has to hold up its end: the reviews, the rating, the UGC that made you recommendable in the first place. This is where Eevy picks up: it continuously optimizes which reviews and UGC each shopper sees on every product page using a genetic algorithm, learning per product rather than store-wide, and Eevy stores lift conversion rate by an average of about 18%. It has a permanent free plan up to 25,000 monthly visitors, then starts at $99 a month. Run the audit to win the mention; let the page win the sale.

Related Reading

Free — 30 seconds

Is your product page losing sales right now?

Most Shopify PDPs we scan have 4+ fixable conversion gaps. Paste your URL and get a scored audit instantly.

Get my free audit →

Frequently Asked Questions

What is an AI search visibility audit?

+

An AI search visibility audit is a structured check of what ChatGPT, Perplexity, Gemini, Google AI Overviews, and Copilot currently say about your brand, products, and category. You run a fixed set of 20 to 40 buying questions through each engine in fresh sessions, score whether you are mentioned, recommended, and cited, and trace which sources feed each answer. Repeated monthly, it shows whether your AI visibility is improving and exactly what to fix.

How do I check my brand's visibility in ChatGPT?

+

Ask ChatGPT (with web search active) the real buying questions in your category in fresh chats with memory off: "best X" questions, comparisons against competitors, and direct brand questions like "is [brand] legit." Record whether you are mentioned, whether you are an actual recommendation, whether your own site appears as a citation, and what the answer gets wrong. One prompt proves nothing because answers vary, so use a fixed question set and track mention rate over time.

Do I need paid tools to run an AEO audit?

+

No. The core audit needs only a spreadsheet and an afternoon: a fixed question set, manual runs across the major engines, a simple scorecard, and free technical checks like robots.txt crawler access and server-rendered schema. Paid AI-visibility tracking tools exist and can automate prompt monitoring at scale, but the manual audit remains the ground truth because reading full answers catches positioning nuances and factual errors that dashboards summarize away.

About the Author

Marius Møller-Hansen

Founder & CEO, Eevy AI

Founder of Eevy AI. Writes about Shopify conversion rate optimization, review systems, and the genetic-algorithm approach to e-commerce display testing.

Read more from Marius →

Free — no account needed

See exactly what's costing you conversions

Paste your product URL. Get a scored Shopify PDP audit in 30 seconds — then see how Eevy AI fixes every gap.

Scan my store →

Related Articles