Synthetic Control for Product Launches

The problem

You're launching a new seller support program in Germany. You want to know if it works. You can't randomise — it's a national policy. You don't have a clean control group. After launch, German seller revenue goes up 18%. Did the program cause that?

Maybe. Or maybe revenue was already trending up. Or maybe a competitor exited the market at the same time. Without a counterfactual — "what would German revenue have looked like without the program?" — you can't say.

The intuition

Synthetic control constructs that counterfactual. The idea: find a weighted combination of untreated units (other countries, regions, or cohorts) that closely matches the treated unit's pre-treatment outcome trajectory. That weighted combination becomes your "synthetic Germany" — the counterfactual of what Germany would have looked like if the program hadn't launched.

After treatment, you compare Germany's actual trajectory to its synthetic version. The gap is your estimated causal effect.

The magic is in the weights. You solve an optimisation problem: find non-negative weights that sum to one and minimise the pre-treatment difference between the treated unit and the weighted donor pool. If the synthetic match is good pre-treatment, you can credibly argue it would have continued to track in the absence of treatment.

Synthetic control doesn't require you to believe markets are identical. It only requires that you can build a weighted combination that behaves identically pre-treatment.

In practice

The method works best when you have a small number of treated units (often just one), a long pre-treatment window, and a rich donor pool of untreated units. It's common in policy evaluation, country-level studies, and product launches where A/B testing isn't feasible.

At Amazon I used synthetic control to evaluate a seller incentive program rolled out to one marketplace. The pre-treatment fit was strong (RMSPE < 2%), giving confidence in the counterfactual. The post-treatment gap showed a 9% lift — versus the naive 18% pre-post comparison, which was contaminated by a category-wide trend.

Inference is non-standard. You don't have a standard error in the frequentist sense. Instead, you run placebo tests: apply the same method to each donor unit as if it had been treated, and check whether the treated unit's gap is unusually large relative to the placebo distribution.

Going deeper (optional)

The original Abadie, Diamond & Hainmueller (2010) paper introduced the method for the economic effect of German reunification. The Synth R package and Python's pysyncon implement it.

Key assumptions: parallel trends in the pre-period (which you can visually verify), and no spillover from treated to donor units (SUTVA). The method is more transparent than DiD because the weights are explicit — you can see exactly which units contribute to the counterfactual and how much.

# Minimise pre-treatment RMSPE
# W* = argmin_W ||X_1 - X_0 W||
# subject to: w_j >= 0, sum(w_j) = 1

# Post-treatment effect at time t:
# α_t = Y_1t - sum_j(w_j* * Y_jt)