Skip to end of metadata
Go to start of metadata

Confidence comes from the term 'statistical confidence', and it is a measure you can use to gauge how sure you are of your results from an A/B test. It is not a measure of how effective a campaign was - instead it is a measure of how certain you can be with your calculated lift. 

This Article Explains

This articles describes in broad terms the meaning of "confidence" as reported by Evergage. 

Sections in this Article


The numbers we report are based on proven mathematical and statistical concepts (specifically, Bayesian statistics) and the data provided by your campaign. Evergage will only assign a confidence rating when both the test and control groups have 35 goal completions. Though this post will try to explain what confidence is in a very simplified manner, rest assured that the theory and calculations behind it are sound. 

Example 1

Suppose you publish a campaign with 50% of your users in the Test campaign group (those who will see the Evergage message(s) and the other 50% as the Control (who will not see anything). After two weeks of running it, let's say you get the following conversion rates per day:

As you can see, the conversion rates vary by day; this happens with real data since the conversion rate isn't going to be constant for every day. Some days it will be higher, and other days it will be lower, as seen in the above example.

The Test group (the group that saw the message) clearly has a higher conversion rate than the control group; the average for test is much higher than the average for the control. But also notice this: the control group oscillates between a minimum of 0 and a maximum of 0.5. In contrast, the test group has a minimum of 0.6 - it never falls into the control group's range. Because of this we are extremely confident that the test group had a higher conversion rate than the control group. 

Example 2

Now suppose you run a completely different campaign and get the following results:

Here the Test group still has a higher average than the control, so we would still think the test group has a higher conversion rate than the control. However notice two things as compared to the last graph. 

1. The distances between the averages are is much less. The two averages are closer together. 

2. This time, the test group does fall within the range of the control group. On even days, the control group actually does better than the test group. 

These two points don't change the fact that the test group has a higher average than the control group. However, these should make us less confident in our result. In this second scenario, we are less sure that the test group really was better than the control. 

This is the intuition the confidence is trying to capture. However, the confidence which is reported is not based on guess work or our arbitrary judgements; they are based on your data and sound math and statistics. 

Even though this example illustrated a case where the test was better than control, you can still measure confidence for the opposite result. For example, in a campaign where control has a negative impact on conversion rate, we can be more or less confident depending on the underlying data.