The problem

Your experiment needs 50,000 users per arm to detect a 3% lift. You have the traffic, but at your current allocation it'll take 6 weeks. Your PM wants results in 3.

You can't change the MDE or the significance level without sacrificing quality. But you can change the variance of your outcome metric — and that's what CUPED does.

The intuition

CUPED (Controlled-experiment Using Pre-Experiment Data) reduces outcome variance by removing the variation in your metric that's explained by users' own pre-experiment behaviour.

The idea: if a user had $200 revenue last month, they'll probably have above-average revenue this month regardless of what you do. That pre-existing variation is noise as far as your treatment effect estimate is concerned. CUPED subtracts it out.

Formally, you construct an adjusted outcome:

Y_cuped = Y - θ * (X - E[X])

Where X is the pre-experiment covariate (e.g., last month's revenue), θ is the regression coefficient of Y on X, and E[X] is the population mean of X. The adjusted metric has lower variance but the same expected treatment effect — making your test more sensitive without changing the sample size or significance level.

CUPED doesn't change what you're estimating. It just estimates it with less noise, using information you already had.

In practice

Variance reduction depends on the correlation between your pre-experiment covariate and your outcome. Higher correlation = more reduction.

import numpy as np

def apply_cuped(y_treatment, y_control, x_treatment, x_control):
    x_all = np.concatenate([x_treatment, x_control])
    y_all = np.concatenate([y_treatment, y_control])

    # Estimate theta via OLS
    theta = np.cov(y_all, x_all)[0, 1] / np.var(x_all)
    x_mean = x_all.mean()

    y_adj_t = y_treatment - theta * (x_treatment - x_mean)
    y_adj_c = y_control - theta * (x_control - x_mean)

    return y_adj_t, y_adj_c

In practice, I've seen CUPED achieve 30–60% variance reduction on revenue metrics when the pre-experiment window is 4 weeks. That roughly halves your required sample size — or equivalently, halves your experiment duration at the same traffic.

Choose the covariate carefully. The same metric from the pre-period usually works best. Multiple covariates can be combined (MLRATE), but adding uncorrelated covariates doesn't help and adds complexity.

Going deeper (optional)

CUPED is a special case of ANCOVA (analysis of covariance). The connection: both methods partial out the influence of pre-experiment covariates on the outcome. ANCOVA is the generalised version with multiple covariates; CUPED is the practical, single-covariate implementation that's easy to explain to non-statisticians.

One edge case: if users in the treatment arm had a different pre-experiment experience than control (e.g., they were already receiving a soft version of the treatment), the covariate is contaminated and CUPED will bias your estimate. Always verify that pre-experiment behaviour is balanced across arms before applying CUPED.

# Variance reduction factor
r_squared = np.corrcoef(y_all, x_all)[0, 1] ** 2
variance_reduction = 1 - r_squared
print(f"Variance reduced by: {(1 - variance_reduction) * 100:.1f}%")
print(f"Equivalent sample size multiplier: {1 / variance_reduction:.2f}x")