Decision making with Bayesian statistics
Read time: 3 minutes
Last edited: Nov 12, 2024
Overview
This topic explains how to make decisions about which variation to choose as the winner in a LaunchDarkly experiment that uses Bayesian statistics.
In cases where you have many metrics to consider, it can be difficult to come up with a consistent decision-making strategy. This is why we recommended that you choose a single, primary metric or evaluation criterion for making decisions before you start the experiment. When using Bayesian statistics, the winning treatment variation is typically the one with the highest probability of being the best among those that have a strong likelihood of outperforming the control. If all treatment variations have a low probability of beating the control, then the control is considered the winner. The other statistics can help you further understand the difference between variations.
Example: Search engine optimization
Consider an example where you need to choose between four search engines for your website, and you’re evaluating each one based on conversion rate. You're measuring two types of probabilities for each search engine:
- Probability to beat control: the likelihood that the treatment search engine will achieve a higher conversion rate than the control search engine
- Probability to be best: the likelihood that the search engine will achieve a higher conversion rate than all other search engines
This table displays each search engine's probability of beating the control:
Variation | Probability to beat control | Probability to be best | Expected loss |
---|---|---|---|
Search engine 1 (control) | 0.04% | ||
Search engine 2 | 91.21% | 25.54% | 0.20% |
Search engine 3 | 97.54% | 8.26% | 0.02% |
Search engine 4 | 92.84% | 66.16% | 0.25% |
Switching between variations in a feature flag is a small configuration change, but there might be other costs associated with switching. In practice, switching to a new variation often incurs costs in terms of time, money, or resources. For example, implementing a new search engine might require additional hardware, software integration, or employee training. Switching to variations with only marginal or uncertain improvements can lead to unnecessary disruptions or even negative outcomes if the observed effects don’t hold up over time. Therefore, you might only consider switching to search engines 2, 3, or 4 if there’s more than a 90% likelihood that the observed improvement over the control search engine is genuine and not due to random chance.
In this scenario, the most logical choice is search engine 4, as it has both a high probability of outperforming the control and the highest chance of achieving the highest conversion rate among all four options. This approach can be considered an "optimal strategy" for decision-making.
However, it’s important to consider both risk and performance. You should assess the potential downside, or expected loss, in cases where there’s a small chance that the winning variation fails to deliver a genuine improvement. In our example, this means evaluating whether it would be acceptable for the conversion rate to drop by 0.25% if the chosen search engine ultimately fails to outperform the control. To learn more, read Expected loss.
Decision making when the best option changes
You may find in some experiments that the winning variation changes from day to day. For example, on Monday variation 1 is the winner, and on Tuesday variation 2 is the winner. This typically happens when there is no real difference between the variations, so the results change slightly day to day depending on the end users encountering the experiment.
Additionally, variations may perform differently over time due to seasonal effects. For example, a "weekend effect" can occur when user behavior shifts significantly between weekdays and weekends, leading one variation to appear as the winner on certain days, only to be outperformed by another on different days. If you suspect a weekend effect or other seasonal trends in your experiment and are seeking a holistic view, make sure the experiment runs long enough to capture several complete weekly or seasonal cycles. This will help smooth out time-based fluctuations and provide a clearer, more accurate view of each variation's performance.
To learn more about Bayesian statistics, read Experimentation and Bayesian statistics.