Your designer wants to change the signup button from blue to green. Someone suggests running an A/B test. Two weeks later, the green button shows a 2% lift with a p-value of 0.04. Victory is declared. But nobody asked whether a 2% lift on button colour was the highest-leverage experiment the team could have run — or whether the result would hold up over time.
A/B testing is the most rigorous tool product teams have for measuring the impact of changes. When done right, it removes guesswork and reveals what actually works. When done wrong, it creates false confidence, wastes time on trivial optimisations, and gives teams the illusion of being data-driven while ignoring the decisions that matter most.
The Core Idea
An A/B test randomly splits users into two groups. Group A sees the current version (the control). Group B sees a new version (the variant). Both groups are measured on the same success metric. If the variant performs meaningfully better than the control — with statistical significance — you have evidence that the change works.
Statistical significance is what separates a proper A/B test from casual observation. It means the observed difference is unlikely to be caused by chance alone. The standard threshold is a p-value below 0.05, meaning there is less than a 5% probability the result occurred by random variation. Reaching significance requires enough users and enough time — which is why A/B testing does not work well for low-traffic products.