AB testing with lognormal distributed data

sam-hoffman · March 20, 2020, 9:03pm

Hi! I’m hoping for some advice, as I’m new to this :)
I have some data which I believe is distributed lognormally. I’m using a normal prior for the mean, and a cauchy prior for the standard deviation. I’m performing AB testing and calculating two values: the probability that the mean of the B variant is greater than the mean of the A variant, and the lift that I could expect if I chose the B variant. Unfortunately, I can’t share the data. The Stan code is as follows:

data {
    real mu_prior;
    real sigma_prior;

    int<lower=0> control_n;
    vector<lower=0>[control_n] revA;
    
    int<lower=0> var_n;
    vector<lower=0>[var_n] revB;
}

parameters {
  real muA;
  real<lower=0>sigmaA;
  real muB;
  real<lower=0>sigmaB;
}

model {  
  muA ~ normal(mu_prior, 2);
  sigmaA ~ cauchy(sigma_prior, 3); 
  muB ~ normal(mu_prior, 2);
  sigmaB ~ cauchy(sigma_prior, 3);
  
  revA ~ lognormal(muA, sigmaA);
  revB ~ lognormal(muB, sigmaB);
}

generated quantities {
//difference in means is the quantity of interest
  real mu_diff;
  real post_revenue_a;
  real post_revenue_b;
  real revenue_diff;

  mu_diff = muB - muA;
  post_revenue_a = lognormal_rng(muA, sigmaA);
  post_revenue_b = lognormal_rng(muB, sigmaB);
  revenue_diff = post_revenue_b - post_revenue_a;
  }

Sometimes, I get high probability that variant B has higher mean than variant A, but a negative average of the revenue_diff term. Does this make sense?

Max_Mantei · March 20, 2020, 9:55pm

Hey! Note that \mu_A and \mu_B are estimates for the mean of \log(\text{rev}_A) and \log(\text{rev}_B). To get an estimate for the mean of \text{rev}_A for example, you need the calculate \exp(\mu_A+\sigma^2_A/2), or exp(muA + 0.5*square(sigmaA)) in Stan code. An estimate for the median of \text{rev}_A would be just \exp(\mu_A). So you are more or less comparing the medians of the whole distributions if you just look at the \mu 's. And depending on the spread and tails of the two distributions, maybe revenue_diff could take an unexpected sign.

I’d just plot the two distributions next to each other and compare visually (if that’s feasible) to see if your results makes sense. Hope this helps! :)

Cheers,
Max

sam-hoffman · March 20, 2020, 10:22pm

Oh wow, thank you! I think this fixed it :)

stijn · March 21, 2020, 3:09am

In addition to what @Max_Mantei said; if the “sometimes” here refers to “for some draws from the posterior” or “for some datasets where you do not have a lot of data”, the prior on the sds could be the driver of what you are seeing. I would not advice to use a heavy tailed distribution like the half-cauchy for the scale of a log normal. Like @Max_Mantei said the mean is e^{\mu + \sigma^2/2} or e^{\mu} e^{\sigma^2/2}. In other words, the heavy-tailed prior affects the mean as a multiplicative factor after squaring and exponentiation. You would need a good amount of data to overcome that prior.

Max_Mantei · March 21, 2020, 11:20am

Very good point, @stijn! Thanks!

sam-hoffman · March 23, 2020, 4:30pm

Thanks for pointing that out! Do you have a suggestion for a looser prior?

stijn · March 24, 2020, 1:53am

It really depends on what scale your outcome is. I typically start with normal(0, 1) and do some simulations in R to see what that gets me. I would start investigating the effect of the prior on the factor e^{\sigma^2/2}.

> mean(factor > 1e3)
[1] 2e-04
> mean(factor > 1e2)
[1] 0.00222
> mean(factor > 1e1)
[1] 0.03173
> mean(factor > 5)
[1] 0.07164
> mean(factor > 2)
[1] 0.23969

For instance, the last line means that your prior implies a 24% chance that your prior leads to a factor larger than 2.

This is for the Cauchy prior. You see that extreme (?) outcomes are much more likely.

sds <- abs(rcauchy(1e5))
> factor <- exp(sds^2/2)
> mean(factor > 1e3)
[1] 0.16561
> mean(factor > 1e2)
[1] 0.20097
> mean(factor > 1e1)
[1] 0.27571
> mean(factor > 5)
[1] 0.32131
> mean(factor > 2)
[1] 0.44584

For a more structured way of doing this, you can look up ~~posterior~~ prior predictive checks.

Topic		Replies	Views
Lognormal model with only summary statistics Modeling specification	6	567	June 18, 2018
Help with non-linear model [Log probability evaluates to log(0)] Modeling rstan	2	358	October 18, 2022
Question on unusual change of variables + prior specification Modeling	7	527	March 30, 2020
Sanity check: Am I covering up a problem, or just tuning the parameters? Modeling	3	99	July 27, 2024
Lognormal with additive effects on the original scale Modeling techniques , specification , loo	4	1035	September 4, 2017

AB testing with lognormal distributed data

Related topics