A/B Test Guide

A/B Test Guide · Chapter 3 of 4

How big a jump are you looking for?

In chapter 2, we learned why your baseline conversion rate matters. The size of the change you're looking for matters too. Let's look at why that is.


You choose how big a jump you want to detect

Before you run a test, you have to decide the smallest lift that would actually change what you ship. This is often called the minimum detectable effect.

If you want to detect small changes, you'll need a larger sample size to see a significant difference.

However, if you are ok with a larger lift, you can get away with a smaller sample size.

There is no right answer to this question — it depends on your product, your goals, and your resources. A 2% relative improvement in a checkout might be worth millions. On the other hand, a 20% improvement in an unused feature might not be worth your time.

Smaller lifts hide. Bigger ones don't.

Here are the two bells from chapter 2 again. Drag the slider to change the percentage improvement and see what happens.

Interactive
5%
A: 10.0%B: 10.5%7.0%8.0%9.0%10%11%12%13%14%Conversion rate per 1,000-visitor sample

At a 5% lift, the bells overlap heavily. Small improvements like this are easy to miss.

Baseline 10%, N=1,000 per variant.

At a 2% lift the bells overlap heavily. You'd need a very large sample before results from each group could reliably tell you which is ahead. Any single draw is noise.

However, at a 50% lift the bells barely touch. The difference is large enough that even a modest sample shows B sitting clearly to the right of A. The lift is doing the work before the statistics have to.

Aim small and you need a lot of data. Aim too big and you might miss a real win.

Two levers in, one to go

You now have two inputs: baseline (your current conversion rate) and lift / minimum detectable effect (the smallest improvement worth chasing). Both shape how far apart the two bells sit. So the further apart the bells, the easier it is to tell them apart with a given sample size. The question becomes, how “extreme” does your variation (Group B) have to be before you will declare it a winner?

We'll look at this in the final chapter. It's called confidence level, and is the last piece of the puzzle.

What we learned

  • Minimum detectable effect is the smallest lift you want to be able to detect. It's a business call, not a stat.
  • Small lifts push B's bell close to A's — the difference hides in the overlap. Large lifts pull the bells apart.
  • You choose the lift before the test runs. The data can't tell you what's worth finding.

Next: the bells are set. Now draw the line that separates a real win from a lucky draw.

Next: Confidence →