A/B Testing & Causal Inference Simulator
A four-tab interactive dashboard demonstrating state-of-the-art experimentation methods used at companies like Netflix, Spotify, Microsoft, and Airbnb.
| Tab | Method | What it demonstrates |
|---|---|---|
| 1 Power Analysis | Z-test power formula | Sample size planning |
| 2 A/B Test Analyzer | Frequentist · Bayesian · CUPED | Multi-method comparison |
| 3 Sequential Testing | mSPRT (Always-Valid Inference) | Safe continuous monitoring |
| 4 Uplift Modeling | CausalForest · X-Learner · T-Learner | Heterogeneous treatment effects |
Dataset (Tabs 3 & 4): Hillstrom E-mail Analytics Challenge — 64,000 customers, 3-arm RCT, 2008.
Sample Size & Power Calculator
Compute the required experiment size before running your A/B test. A well-powered experiment is the foundation of valid inference.
Multi-Method A/B Test Analysis
Enter observed results from any A/B test to compare Frequentist, Bayesian, and CUPED methods side-by-side.
Group A (Control)
Group B (Treatment)
mSPRT — Always Valid Inference
Traditional A/B testing breaks if you peek at results mid-experiment. The mixture Sequential Probability Ratio Test (mSPRT) lets you monitor continuously without inflating the false positive rate. Pre-computed on 3,000 simulated experiments.
Heterogeneous Treatment Effect (HTE) Estimation
Not all users respond equally to a treatment. Uplift modeling estimates each user's individual treatment effect (CATE) using three ML-based causal estimators from Microsoft EconML.
Data: Hillstrom E-mail Marketing — 64,000 customers, randomized 3-arm experiment. Treatment: Any marketing email vs. no email.
Built by Muhammad Fikri Wahidin · GitHub · Methods: CUPED (Microsoft 2013) · mSPRT (Johari et al. 2015) · CausalForestDML (Athey & Wager 2019, via EconML)