What is an A/B Test?
A/B testing, in its true definition, is a form of statistical hypothesis testing with two variants: test and control. Within Interaction Studio, A/B testing more broadly defines any campaign that uses a randomized split in the traffic to test a hypothesis. There are several things you should understand before creating an A/B test:
- When creating an Interaction Studio campaign, you can create multiple experiences which are shown to a set percentage of traffic as determined by your needs
- As a best practice for an unbiased, statistically sound test, create messages as variations of each other. For example, testing a popup against an inline message is not a true measure of effectiveness as both invoke different behaviors for the users and cannot be compared. A better example of an A/B test would be a campaign that tests which exit popup discount code is more effective
Create an A/B Test
Create an A/B Test Across Multiple Campaigns
There are use cases which require targeting multiple content zones with campaigns to personalize the experience, and require those personalizations to be coordinated in order to provide a seamless experience for the customer. For example, you might want to personalize a homepage hero banner based on a customer's favorite category, then show a recommendations zone lower on the page with recommendations in that same category. To ensure customers receive the aligned experience across both campaigns, you can use A/B Test Segments to randomize your audience within a rules-based campaign type, and persist that randomized selection for any experience targeted to the A/B test segment.
A/B Testing Considerations
Clearly State Your Testing Goals as a Hypothesis. A typical Interaction Studio hypothesis should look something like this: "If I change (A) for group of users (B), I will expect to see (C) results."
Research Your Hypothesis Ahead of Time. Before developing your campaign, conduct some initial analysis to validate that your hypothesis is worth testing. For example, before developing a campaign you might want to ask yourself, "Does my hypothesis have a target audience? If so, what is the size of that audience? Can I create an audience segment in Interaction Studio?" This will allow you to determine the reach of your hypothesis and resultant campaign.
Give the Test Adequate Time
- Resist a rush to judgement. In the early days of test, it is best to ignore the campaign statistics, since they can change before stabilizing. This is especially true for low-traffic tests.
- Use multiples of a full week of data. Since behavior varies across days, a best practice is to include multiples of a full week of data to determine the test winner over a period of time. You can set the beginning and end dates on the Campaign Statistics screen.
- Traffic Allocation: Each user profile is randomly selected into a percentile group. Hence, the traffic split may not be reflective of the experience allocations when the campaign is first set live, but will even out as traffic levels increase.
The End of the Test is Not the End of the Story
- Post test lift may be higher or lower. Sometimes, even if you have a clear winner, you may not be able to replicate the exact lift you saw in the experiment. The key is that the winner's metric will be higher than the loser's.
- Check for "Shiny Object Syndrome." When you do see a winner, a best practice is to check whether you see the same effect across groups of first-time and returning visitors. If you don't see the effect for first-time visitors, this lift may be due to novelty or "shiny object syndrome." Your regular visitors are curious about the change, but will eventually revert to their previous behavior. If you do see the effect for first-time visitors, it is more likely to be lasting.
Decide Whether to Use a Sample Size Calculator
How long you run a test should be driven by your data rate and the size of the effect you're looking for. You may want to consider using a sample size calculator when planning your test. There are many available online that you can reference. This will give you an idea of how much data you need for a given test, and that in turn will give you a sense of how long to run the experiment. In classical statistics, you do this in advance, and check your result once (and only once) when you have the required data; at that time you declare whether any difference you see is real or due to chance.
- Using a sample size calculator is optional. Sample size calculation is known as a power analysis, and is a required step in classical statistical testing. Because Interaction Studio uses a Bayesian approach rather than a classical framework it is not required that you calculate your sample size, but it may help you with planning if you're unsure about how long a test should run.
- Compute the sample size and run the test. If you decide to use a sample size calculator, if you have reached the sample size and have not reached 95% confidence, you can stop the test and declare there was no difference. (You can use a difference confidence if you like.) If you have reached 95%, you can declare a winner.
- You may reach confidence sooner than expected. In contrast to classical testing, Interaction Studio models the underlying distributions directly, and typically requires fewer samples. So, if you are running your test for a month, but see confidence at 3 weeks, you can stop and declare a difference.