Skip to content




How to run prompt-level SEO experiments for AI search

Featured Replies

How to run prompt-level SEO experiments for AI search

As LLMs continue to grow, optimizing brand visibility in AI-generated responses is becoming increasingly important. Consumers are turning to these models for answers, recommendations, recipes, vacations, and nearly everything else imaginable.

But what happens if your brand isn’t included in those responses? Can you influence the outcome? And what are some proven ways to improve your brand’s inclusion and visibility?

That’s where structured experimentation comes in. Prompt-level SEO requires more than assumptions or one-off wins. It requires repeatable testing frameworks that help isolate what actually influences LLM responses.

Build prompt-level SEO tests with a hypothesis framework

There are countless recommendations on how to improve your LLM presence. Experimentation is key to discovering what works for your industry and brand.

Hypothesis-driven testing is the way we structure these tests for our brands. It breaks things down in a structured way that can be replicated across tests and situations.

This framework creates a common approach to testing and helps you quickly understand the test and its outputs. The structure consists of three main pieces: if, then, because.

  • If: This part provides the hypothesis: what is the test action?
    • “If we include more detailed product specifications in our content.”
  • Then: What will happen once the “if” section is completed? The outcome.
    • “Then we’ll see our brand get included in more product-specific prompts.”
  • Because: This is why you believe this will occur. What is the theory behind this test?
    • “Because LLMs value detailed and specific information in their prompt responses.”

This framework requires some basic fundamentals that ensure you’re thinking through the test. It also allows you to go back later and validate whether you have tested these specific elements in the past and what the premises, theories, and outcomes were. 

This helps because, as things change, the test elements may still be valid simply because the world shifts — changing the “because” section.

Key considerations before running prompt-level SEO tests

Before we get to the recommendations for testing best practices, here are some considerations when running these tests:

  • Model updates: These models are updated constantly. As some models move from 4.1 to 4.2, it’s time to revisit those results. How did the model change the inputs and outputs?
  • Prompt drift: Have you ever run the exact same prompt twice in a day or on consecutive days? Often, the results change. Therefore, running the prompt more than once and on consecutive days to evaluate the outcome is important to get a true baseline. This is no different from personalized search results. Brands get comfortable with the variance, but some averages surface and become the benchmark. Prompt testing works much the same way.

Now that you have the framework of the test, let’s think about the core elements of tests that can be used in prompt-specific testing.

How to isolate variables: A methodological approach

Designing a reliable prompt-level SEO experiment requires isolating a single causal variable. This is crucial for confidently attributing changes in LLM response inclusion or position to a specific action.

1. Content changes

When testing content modifications, the variable must be surgical. A common pitfall is changing too much at once (e.g., updating a product description and the page’s schema).

  • Best practice — The single-paragraph swap: Focus on modifying a single, targeted piece of text on the page, such as a product description, FAQ answer, or a specific feature bullet point.
  • Methodology: For true isolation, implement A/B testing with a control page containing the original content and a test page containing the modified content. The prompt should be designed to target the specific information you changed. Measure the brand’s inclusion rate and position-in-response over a defined period (e.g., seven days – keep in mind these models are moving at a variety of speeds. This work, much like SEO, isn’t a microwave, but more like an oven).

2. Structured data

Structured data (schema) provides explicit signals to both search engines and LLM ingestion layers. Testing this requires treating the schema update as the only change to the page.

  • Variable isolation: Test adding new properties (e.g., brand, model, and offer details) without altering the visible HTML text. This isolates the impact of the machine-readable layer.
  • Specific experiment — FAQ schema: A highly effective experiment is adding FAQ schema to pages that already have Q&A sections in their HTML, isolating the effect of the explicit schema markup on LLM ingestion. Our work with brands has demonstrated that adding FAQ schema to pages with Q&A sections makes those sections easier for LLMs to ingest.

3. Before-and-after prompt testing

This process involves establishing a stringent baseline, making the change, and then repeating the prompt query. This is an essential control method in lieu of true A/B testing on the LLM itself.

Protocol

  • Phase 1 (baseline): Execute a set of 5-10 target prompts daily for seven consecutive days to establish a true average of inclusion and position-in-response, accounting for prompt drift.
    • Action: Deploy the isolated change (e.g., content or schema update).
  • Phase 2 (measurement): Re-run the exact same set of prompts daily for the next seven days.
    • Analysis: Compare the average inclusion rate and position of Phase 1 versus Phase 2. This method is central to initial presence score analyses, such as using three buckets of 25 keywords and prompts for a total of 75 queries.

Get the newsletter search marketers rely on.


Encouraging reproducible experiments

With the speed of model evolution and the lack of detailed model insights, it’s difficult to ensure reproducibility of results. However, the goal is to move beyond simple “it worked once” findings to build a durable methodology.

Mandatory frameworks

Ensure every test is documented using the “if, then, because” hypothesis structure. This archives the premise, action, and expected outcome, allowing future teams to quickly validate whether a test remains relevant as LLMs evolve.

Technical integrity

  • Version control: Document the specific model and version used for testing (e.g., “Gemini 4.1.2”). This allows for easy comparison when a model update occurs.
  • Prompt libraries: Maintain an organized, time-stamped repository of the exact prompt queries used for baseline and measurement phases. This repository should track inclusion rate, position-in-response, and sentiment/framing for each query.

Infrastructure consistency

Define the testing environment (e.g., clear browser cache, no login state) and, where possible, use APIs or synthetic testing platforms to remove the impact of personalization and location bias, which is analogous to controlling for personalized search results in traditional SEO.

Moving beyond one-off wins in AI search

The key to prompt-level SEO is rigorous methodology. By adopting a hypothesis-driven approach, surgically isolating variables (content, entities, schema), and establishing strict before-and-after testing protocols, you can confidently move past speculation. 

The path to influencing LLM responses is paved with controlled, documented, and reproducible experiments.

View the full article





Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.