Interaction Studio Classic Only
Please note, the contents of this article are intended for customers using Interaction Studio (formerly Evergage Classic). Do not adjust your beacon version to downgrade or upgrade.
This article details how to use A/B Testing to split traffic randomly into different groups and show each group variations of a message.
What is an A/B Test?
A/B testing, in its true definition, is a form of statistical hypothesis testing with two variants: test and control. Within Interaction Studio, we use A/B testing to more broadly define any campaign that uses a randomized split in the traffic to test a hypothesis. There are several things you should understand before creating an A/B test:
- When creating an Interaction Studio campaign, you can create multiple experiences which are shown to a set percentage of traffic as determined by your needs
- As a best practice for an unbiased, statistically sound test, create messages as variations of each other. For example, testing a popup against an inline message is not a true measure of effectiveness as both invoke different behaviors for the users and cannot be compared. A better example of an A/B test would be a campaign that tests which exit popup discount code is more effective
Create an A/B Test
Things to Consider Prior to Testing
Give the Test Adequate Time
- Resist a rush to judgement. In the early days of test, it is best to ignore the campaign statistics since they can change before stabilizing. This is especially true for low-traffic tests.
- Use multiples of a full week of data. When you are deciding at test, a best practice is to include multiples of a full week of data, since behavior will vary across days. You can set the beginning and end dates on the Campaign Statistics screen.
The End of the Test is Not the End of the Story
- Post test lift may be higher or lower. To set expectations, even if you have a winner, you may not replicate the exact lift you saw in the experiment. The key is that the winner's metric will be higher than the loser's.
- Check for "Shiny Object Syndrome." When you do see a winner, a best practice is to check whether you see the same effect across first-time and returning visitors. If you don't see the effect for first-time visitors, this lift may be due to novelty or "shiny object syndrome." Your regular visitors are curious about the change, but will eventually revert to their previous behavior. If you do see the effect for first-time visitors, it is more likely to be lasting.
Decide Whether to Use a Sample Size Calculator
How long you run a test should be driven by your data rate and the size of the effect you're looking for. You may want to consider using a sample size calculator when planning your test. There are many available online including: http://www.evanmiller.org/ab-testing/sample-size.html. This will give you an idea of how much data you need for a given test, and that in turn will give you a sense of how long to run the experiment. In classical statistics, you do this in advance, and check your result once (and only once) when you have the required data; at that time you declare whether any difference you see is real or due to chance.
- Using a sample size calculator is optional. Sample size calculation is known as a power analysis, and is a required step in classical statistical testing. Because Interaction Studio uses a Bayesian approach rather than a classical framework it is not required that you calculate your sample size, but it may help you with planning if you're unsure about how long a test should run.
- Compute the sample size and run the test. If you decide to use a sample size calculator, if you have reached the sample size and have not reached 95% confidence, you can stop the test and declare there was no difference. (You can use a difference confidence if you like.) If you have reached 95%, you can declare a winner.
- You may reach confidence sooner than expected. In contrast to classical testing, Interaction Studio models the underlying distributions directly, and typically requires fewer samples. So, if you are running your test for a month, but see confidence at 3 weeks, you can stop and declare a difference.