I finally found a good opportunity to use bayesian methods, but I’m not sure I’m doing this right.
The question I want to answer is whether giving discounts to costumers (companies in this case) increases the amount of business they do with me. The code is below, but unfortunately I can’t share the data.
We have 20 companies, the before and after period is the same length within each company, but different between; values are total amount of sales in euros for each company, before and after discounts started, not scaled. I expect the log of the values to be approximately normal. ba
is coded as 0 for before and 1 for after.
I think traditionally someone would do a paired t-test and call it a day.
This is what I did with Stan:
\mu_{i} = \beta_{0} + u_{comp_{i}} + \beta_{1} \cdot period_{i}
totamount_{i} \sim LogNormal(\mu_{i}, \sigma_{e}).
data {
int<lower=1> N; // number of data points
int<lower=1> J; // number of companies
real<lower=0> ta[N]; // total amount for each company in each condition
real<lower=0, upper=1> ba[N]; //predictor
int<lower=1, upper=J> subj[N]; // company id
}
parameters {
vector[2] beta; // intercept and slope
vector[J] u; // company intercept
real<lower=0> sigma_e; // error sd
real<lower=0> sigma_u; // company sd
}
model {
real mu;
// priors
u ~ normal(0, sigma_u);
for (i in 1:N) {
mu = beta[1]+ u[subj[i]] + beta[2] * ba[i];
ta[i] ~ lognormal(mu, sigma_e);
}
}
diagnostics are all good (I used shinystan to check the chains, sample sizes, rhat).
\beta_{0} has a value which make sense (around 66k after exponentiating); \beta_{1} has a mean of exp(1.1) = 3 and zero is not in the distribution. Is it correct to say discounts add, on average, 3 euros to total volume? I intend to add more variables to the model, but wanted to know if I’m on the ballpark here. From what I understand using flat priors here is not a great idea, but I’m not sure what is appropriate. Some regularization is probably warranted.