You’re running a test on one of your landing pages with the primary goal of getting the user to click on a button. You are split testing (A/B testing) two different button colors (green and orange) to determine the impact of the button color on click-through rate. You collect the following initial results:
| Button Color |
Visitors |
Clicks |
CTR (y) |
| Green |
38 |
2 |
5.2% |
| Orange |
39 |
3 |
7.7% |
The orange button is better, right?
Not necessarily. Sure, the orange button has a 7.7% click-through rate (CTR) compared to only 5.2% for the green button. However, the orange button has really only earned one more click. If the next visitor on the page clicks on the green button, both variants will have a 7.7% CTR, indicating that button color is irrelevant in this application.
Here is what happens if we run this experiment for several months with 16,000+ visitors:
| Button Color |
Visitors |
Clicks |
CTR (y) |
| Green |
8,238 |
486 |
5.9% |
| Orange |
7,893 |
734 |
9.3% |
Is the orange button better now? Hell yeah, it is.
While the danger of making decisions based on too little data is an incorrect conclusion, the danger in collecting too much data is a waste of time, effort and money. Even though there was little data to determine orange was best in the first variant, had you decided to go with the orange button anyway, you could have sent all the above 16,000+ visitors to the page that performs at a 9.3% CTR. Thus, giving you a whole lot more sales/leads/etc.
So how many users are required to make this decision? We can use the following equation:

Where:
- n is the minimum sample size required to prove that the two variants are statistically different.
- Z is the z-value corresponding to the chosen confidence interval in the Table of the Standard Normal Distribution.
- E is typically known as the “error.” In this application, E is the difference between the mean values of two samples.
- σ is the standard deviation.
The Z-score that corresponds to 95% confidence is 1.64 (from the Table of the Standard Normal Distribution).
For example, let’s say you are using the data in the second table above, and you want to be 95% confident in your decision. E is the difference between the two sample means, so
E = 0.093 (orange CTR) – 0.059 (green CTR)
E = 0.034
To figure out the standard deviation, we can treat binary data like continuous data because of the Central Limit Theorem, which states that as a binary sample gets larger, its distribution approximates a continuous distribution. So, determine the overall CTR as follows:
total clicks = 486 + 734 = 1,220
Total visitors = 8,238 + 7,893 = 16,135
That means the overall conversion rate was
1,220 / 16,135 = 0.0756
In Excel, you can get the σ-value by using the function =NORM.S.INV(1-0.0756) which returns 1.435.
Putting all of that together, the equation to determine what minimum sample size is required to show an accurate, statistically significant improvement of CTR for the tested page with the orange button is:
n = ( (1.64*1.435) / 0.034 ) 2
n = 4791
Therefore. In this example you would need to make sure you have 4791 samples in order to prove that you have enough to make a statistically significant decision over which variant is better.
Most A/B testing software will take care of his for you. What you really need to understand about the equation is:
- As standard deviation increases (more variation in your conversion rate), you will need more samples.
- If you want more confidence (95% versus 90%), you will need more samples.
- As the difference in performance between the two variants becomes smaller, you will need more samples (it takes more data to make sure the difference isn’t just statistical noise).
Easy right? Now get to work.