Statistical significance metric overview

Updated

Learn how statistical significance helps find the winning variation.

Statistical significance (stat. sig.) is a metric in traditional test optimizations that interprets your results and determines how realistic they are (and whether they’re due to random chance).

Statistical significance is a metric in statistical analysis, determined with standard statistical mathematical equations, that determines whether a specific statistic is reliable. The idea is that if you take a sampling of data from a larger group of people, you want to know how confident you can be that the results from that sampling will be reflective of the larger whole.

How is statistical significance used in traditional test optimizations?

After enough data is collected, Webflow uses mathematics to provide another metric for statistical significance. This lets you verify the metrics you see are probably driven by the variations in your optimization and not random spikes in data caused by outside influences or pure chance.

The "test" is considered complete when a high statistical significance is reached. Webflow will then confidently inform you of which variation is considered the winner.

Variation metrics are compared to the Base (no change) variation. Statistical significance directly relates to those comparisons.

  • Higher stat. sig. means that you can confidently assume that the results are stable
  • Lower stat. sig. means that there's a greater chance of the metrics fluctuating
  • <1% stat. sig. means that there's not enough data and the metrics will likely fluctuate a lot

What goes into making a higher statistical significance?

  • Time matters — data is based on visitor behavior, so it's important to let enough time pass to account for random spikes in data. For example, running an experience for seven days is not as insightful as running it for 30 days.
  • Sample size matters — all variations need to have a large enough sample size to have impactful results against which to compare. For example, 30 visitors in each group over a period of time won't have as much of an impact as 1,000 visitors.
  • Effect size matters — the size of the gap between a variation's results and Base (no change) results can weigh in as well. If the gap is very small, you might need more time to reach a higher statistical significance. If a very large gap is sustained for a period of time, you might need less time to reach a higher statistical significance.