Specifying Group Level Priors in brm function

Hi Stan Experts,

I am working on a customer sales data where, sales(continuous in nature) is my dependent variable and different marketing channels activity as independent variable. I am trying to estimate the coefficients of each channel to get the effect from each of them.

Additionally, based on the behavior of each of the customer, each customer can be categorised in either tier1, tier2, tier3 or tier4. Before looking the data, I know the sales from customers from different groups are not exchangeble in nature and there is a clear distinction in sales behavior of customers from different tiers.

Hence, I have randomized my model at tier level. Below is my modeling function

bmod1 <- brm(Sales ~ Comp + FTO + CPL + SPL
             +   CC + ENG + IMP + PSO+ (1|tier)+ (0+FTO|tier)
             + (0+Comp|tier)+(0+SPL|decile)+(0+CC|tier)
             + (0+ENG|tier) + (0+IMP|tier) +(0+PSO|tier) ,
             data = mdb_14, family = gaussian(),prior=prior2,
             warmup = 1000, iter = 5000, chains = 4,
             control = list(adapt_delta = 0.98), seed=150, thin=2,
             cores = parallel::detectCores() 

Furthermore, I have results(both group level and population level coefficients) from previous year’s random effect regression model(not bayesian) and I am planning to use those results as priors to my current year model.

I need some help with setting up the priors. Right now I have specified priors at population level as:

prior2 <- c(
  prior(normal(0.001,0.28), class = Intercept),
  prior(normal(-1.14,0.23), class = b, coef = Comp),
  prior(normal(0.24,0.18), class = b, coef = FTO),
  prior(normal(0.25,0.005), class = b, coef = CPL),
  prior(normal(0.0585,0.0022), class = b, coef = SPL),
  prior(normal(0.31,0.006), class = b, coef = CC),
  prior(normal(0.0370,0.0080), class = b, coef = ENG),
  prior(normal(0.55,0.02), class = b, coef = IMP),
  prior(normal(0.2,0.02), class = b, coef = PSO),  
  prior(cauchy(0,10), class = sigma)

However, as mentioned in Bayesian Data Analysis, by Andrew Gelman, we can only provide a common prior if there is group level exchangebility present among groups, which is clearly not true in my case as different groups have different behaviors.

I want to know how can I provide group level priors to my model. For e.g.
Prior : FTO_tier1 = normal(0.24,0.18)
Prior : FTO_tier2 = normal(0.34,0.2)
Prior : FTO_tier3 = normal(0.45,0.01)
Prior : FTO_tier4 = normal(0.8,0.1)

Does brm function allow us to provide group level priors?

Please let me know if you would need additional details. Thanks in advance for your help :)

1 Like

Sorry for not getting to you inquiry earlier. It is relevant and well written.

I don’t think that is possible - note that by definition of the model FTO_tierX have mean zero and share the same standard deviation, so it doesn’t make sense for them to have “different priors”. If you want them to have this structure, then you can always model FTO*tier in which you can set those priors.

Nevertheless the best option would IMHO be to build a model that fits both the old and new data at the same time, accounting for possible drift of the parameters between years, so assuming year is a factor with two levels you would have something like

Original, single year model:

Sales ~ Comp + FTO + (0 + CC | tier)

Two year model:

Sales ~ (Comp + FTO | Year) + (0 + CC | tier * year)

(there are obviously other options to do this).

Does that make sense?

Hello @martinmodrak,

Thanks for your comments. Just a few queries from your answer:

  • “you can always model FTO*tier in which you can set those priors.” - Would be very helpful if you could please explain how can I model FTO*tier? Could you please help me with the syntax?

  • Sales ~ Comp + FTO + (0 + CC | tier), in this syntax you suggested, are we allowed in brm function to specify CC only as the random effect but not as the fixed effect?

  • Lastly my model is taking very long to give solution. I tested the model which I mentioned in qestion with 4.3k records and it took 15 hours to give solution. I am working on a Linux based server environment with 128 cores shared computing. Is this long runtime the problem of my model or with the platform I am running on?
    Is there anything I could do to cut short the time or II will need a more powerful platform, probably AWS based platform?

Thank you once again for your help, really appreciate it.

My idea was really just to use FTO*tier instead of (0+FTO|tier) - this let’s you set the prior, but avoids sharing informatio across the tiers.

You are allowed to do this, but I agree this is usually not a good idea. I mean the model just to show the general pattern of converting the model, not to suggest a specific formula you should use.

This is most likely an issue with the model - your model has a lot of parameters, so it is quite possible not all of them are well informed by your data. I assume you get divergent transitions with lower adapt_delta - you may check out Divergent transitions - a primer for some strategies (that apply to all sorts of problems, not only divergent transitions). Generally, I would start with a subset of the data and a smaller model to let you iterate quickly and understand where the problems lie - such a big model as the one you have is almost impossible to debug.

Feel free to ask here if some of the suggestions are unclear.

Best of luck with your model!

Thanks for your help Martin, this is really helpful and thanks for sharing the wonderful article.

Just a few clarifications on:

My FTO is a continuous variable and tier is a categorical variable with values t1,t2,t3 and t5. Do you mean I should create 4 dummy columns for my independent variable say FTO_t1, FTO_t2, FTO_t3 and FTO_t4 and distribute the values from the original FTO column in these four dummies as per their tier? Something like this?

Customer tier FTO FTO_t1 FTO_t2 FTO_t3 FTO_t4
111 t1 10 10 0 0 0
112 t2 11 0 11 0 0
113 t3 12 0 0 12 0
114 t4 3 0 0 0 3

And then feed these 4 columns in my model?

I am working on a business problem in which many marketing channels were deployed. I am required to study the impact of all of these channels hence I am afraid the ideal model would need all of these channels as variable. Additionally, I have observed once I do not randomize these variable my code takes only 20-30 mins to give results as compared to 15 hours when randomizing all of them. I would love to get your thoughts on this.

Thank you once again :)

If I understand you correctly than this is exactly what using FTO*tier does for you automatically (if tier is a factor or character).

Unfortunately, there is no guarantee that the questions we ask are answerable by the data we have - so it is possible that an ideal model would contain all of those predictors, but in practice we don’t have enough data to learn about all of them - so we have to make some compromises. The model taking so much longer is probably a sign that this is indeed the case as underdetermined models can have many pathologies that make sampling difficult.

Hope that helps at least a little bit.