Setting prior on interaction term

Hi, first time bayesian here. I’m looking for help setting priors for a regression with an interaction term. I have data on activity space area (i.e., area in km2 of polygon representing where kids spend time), sex, and age. Prior work reported that activity space size did not differ for young kids, but got bigger with age for boys and shrank for girls. The thinking is that the onset of puberty leads to the shrinking of girls’ social environments in some contexts.

Anyway, that study reported the following activity space areas:

6.33 sq miles for younger girls
2.63 sq miles for older girls
3.79 sq miles for younger boys
7.91 sq miles for older boys

Our area data represent sq km, not sq miles, and we used more precise (I think) methods to capture the data. But I’m wondering how to incorporate information from this prior study into the model (or if I should). I want to test the hypothesis that there is a significant interaction between age and sex such that activity spaces get bigger with age for boys, but shrink for girls.

Here’s the data, my approach in {lmer}, and my default approach in {brms}. We sampled by school so I include it here as a random effect.

# packages

# load data
gist <- source_gist("9518f647b0a6b070cf36ab6fc46d670d", filename = "as_brms.R")
dat <-[1])
names(dat) <- gsub("value.", "", names(dat))

# create indictor for males and center age
dat <- 
  dat %>% 
  mutate(y_male = ifelse(y_female==1, 0, 1),
         y_age_c = y_age - mean(y_age)) # center age

# lmer
  m1 <- lmer(as_areaK2 ~ y_male*y_age_c + (1|school), data=dat)
# brms
  m2 <- brm(as_areaK2 ~ y_male*y_age_c + (1|school), data=dat)

It appears that boys’ activity spaces are actually smaller with age compared to females’.

So two questions:

  1. Should I include this prior information?
  2. What are some ways to go about it?

I’m hoping that this is a nice, simple example for me to learn a bit about BDA.

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.5


Small update: I found code from Statistical Rethinking, brms/ggplot edition by @Solomon helpful in plotting the interaction.

dat_factor <-
    m2$data %>% 
    mutate(y_male = factor(y_male)) 
  m2_factor <- update(m2, newdata = dat_factor)
  m2.p <- plot(marginal_effects(m2_factor))
  m2.p$`y_age_c:y_male` + theme_minimal()

1 Like

I’m sure some more knowledgeable folks will eventually drop by, but for starters

a) Definitely shouldn’t base your prior on the results in your current study/data. You can change priors (e.g. make them more informative) if the model isn’t identifiable or there are sampling issues, sure, but you don’t want to look at your outcome and then set the prior to be more in line with that
b) At the same time incorporating data from prior studies is an advantage of the Bayesian framework, so you COULD set priors such that boys have a slight positive tendency over girls. But it would probably be safer to just give them the same prior if you have enough data and there isn’t a clear scientific consensus.

I think a good start is to use the get_prior() function to see what priors are involved in your model. Then you can set some weakly informative priors and run the model with the option ‘sample_prior="only’. This will sample from the prior (i.e. what does my model think is reasonable before any data). If you get absurdly high betas or impossible associations you may want to adjust the priors. You can even take the delta from the prior study as a reasonable ballpark for what sorts of sex effects you may consider reasonable. You can include this prior predictive simulation data in your paper as justification.

Then you can run your actual analysis. Once you have the posterior, you can plot the interaction effects, like you have, or you can compare conditions using hypothesis() or the compare_levels() function from tidybayes.

Hope that gets you started! I’d suggest McElreath’s course / book in general. There are also plenty of good threads on here on priors, interactions, etc.

Good luck!


Hey Eric,

Welcome to Stan Forums!

(1) Should I include this prior information?

That’s a big question and depends on some combination of the quality of the prior data with respect to the data you intend to collect, the way you view the role of priors, and how you suspect your intended audience will react to priors.

To the extent the prior data are unrepresentative of those you’d like to collect, I’d be weary. The converse is true if you anticipate collecting similar data.

If you view priors as the ability to make a cumulative science, using them is great! If you view priors as encapsulating your subjective beliefs, I suppose that depends on how effectively the prior data swayed your research intuitions. If you view priors as nuisances, just use flat ones! And if you like the notion of priors as regularizing devices, use them that way (e.g., \text{Normal} (0, 1) on beta coefficients).

Your answers to the first two questions will be tempered by how you suspect priors will be received by those you’d like to present your findings to. It’ll also depend on how willing you are to go a couple rounds with Reviewer #2. Some of this might could be avoided with a prior sensitivity analysis.

(2) What are some ways to go about it?

Let’s presume you intend to collect very similar data in the future. You could use the lmer() point estimates and standard errors directly as the priors for the next round of data. I ran the model on my computer. Here’s the summary output I got:

Random effects:
 Groups   Name        Variance Std.Dev.
 school   (Intercept)  3.741   1.934   
 Residual             30.240   5.499   
Number of obs: 218, groups:  school, 14

Fixed effects:
               Estimate Std. Error t value
(Intercept)      3.4682     0.7818   4.436
y_male          -1.0545     0.7581  -1.391
y_age_c          0.5251     0.3419   1.536
y_male:y_age_c  -0.2635     0.4836  -0.545

Correlation of Fixed Effects:
            (Intr) y_male y_ag_c
y_male      -0.459              
y_age_c      0.023 -0.004       
y_mal:y_g_c  0.020 -0.007 -0.697

Let’s go in order, starting with the level-2 variances. Unlike lme4, brms parameterizes those as standard deviations. If you’d like your priors to be precise, you’ll have to work a bit. So first steps first. Based on the summary information alone, we don’t know the exact shape of your 3.74 (1.93) school intercept. But since we do know variances are bound to zero and above and are continuous, gamma seems like a natural starting point. If you’re willing to use 3.74 as the mode and 1.93 as the standard deviation, here’s how to get the corresponding gamma hyperparameters.

gamma_s_and_r_from_mode_sd <- function(mode, sd) {
  if (mode <= 0) stop("mode must be > 0")
  if (sd   <= 0) stop("sd must be > 0")
  rate  <- (mode + sqrt(mode^2 + 4 * sd^2)) / (2 * sd^2)
  shape <- 1 + mode * rate
  return(list(shape = shape, rate = rate))

gamma_s_and_r_from_mode_sd(mode = 3.741, sd = 1.934)

That returned:

[1] 5.561848

[1] 1.219419

My math is not good enough to know how to convert that into a standard deviation metric. But I can simulate.

tibble(sigma_2 = rgamma(1e5, shape = 5.561848, rate = 1.219419)) %>% 
  mutate(sigma = sqrt(sigma_2)) %>% 
  summarise(mean = mean(sigma),
            sd = sd(sigma))

That returned:

 mean    sd
  <dbl> <dbl>
1  2.09 0.447

If you look at sigma in a plot, it’s just a little skewed. So if you wanted to use a gamma prior with hyperparameters based on those values, you would just be lazy and use the mean as a proxy for the mode and use reuse the gamma_s_and_r_from_mode_sd() function.

gamma_s_and_r_from_mode_sd(mode = 2.088724, sd = 0.4468259)

Which returned:

[1] 23.80971

[1] 10.9204

If you wanted to be less shameful, you could use this function to compute a gamma based on a mean and an sd.

gamma_s_and_r_from_mean_sd <- function(mean, sd) {
  if (mean <= 0) stop("mean must be > 0")
  if (sd   <= 0) stop("sd must be > 0")
  shape <- mean^2 / sd^2
  rate  <- mean   / sd^2
  return(list(shape = shape, rate = rate))

To give credit, both these gamma_s_r_from...() functions are based on Kruschke’s work (see here). Anyway, you could encode this information in your brms model with something like prior(gamma(23.80971, 10.9204), class = sd). You could take a similar approach with the lme4 Residual summary. Only that’d be for the brms sigma parameter, which you’d encode in some prior statement like prior(gamma(xxx, xxx), class = sigma).

The priors corresponding to the summary output from the Fixed effects: section are easier. These are often modeled with members of the Student-t family. Since your y_male covariate is not centered at zero, it might be best to avoid the default brms intercept syntax and use, instead, the as_areaK2 ~ 0 + Intercept + ... syntax for the formula. That means that when you set the prior for the intercept, you’d use class = b. If you went bold and used a Gaussian prior for the intercept, it could look like this: prior(normal(3.4682, 0.7818), class = b, coef = Intercept). The priors for the other fixed effects would be similar. You’d just use the name of the variable in the coef = xxx part.

As to the final section, the Correlation of Fixed Effects:, your options are limited. To my knowledge, brms does not support modeling the individual correlations directly. But you do have the lkj prior. Setting it to 1 leaves it flat. Setting it to a value like 2 or 4 gently pushes your level-2 correlations away from the boundaries.

This is all assuming you leave the data in an unstandardized metric and assuming you want to directly encode the information from the prior model into the future one. One way to use the information in a more casual way would be to multiply all the standard deviation hyperparameters by a factor of 2 or so (e.g., prior(normal(3.4682, 1.5636), class = b, coef = Intercept). You could also take a weakly-regularizing approach by standardizing all the data beforehand and just slap something like normal(0, 1) on all parameters other than the level-2 correlations.

For more general ideas on priors, check out the Stan prior wiki.


Thanks @HugoBotha and @Solomon. These forums are such great resources because folks like you take considerable time to write helpful replies.

@Solomon I learned a lot from your post in particular. Thanks for walking me through an example. The “should I” question is a tough one for a newbie like me. In this case the study population is similar, but the methods are different. I’d say that a lot of people would believe the prior finding that girls’ social environments shrink during puberty in some contexts (it’s commonly cited in policy reports), but I don’t have strong beliefs personally. Given that I might be in the minority with weak priors, I was wondering whether I should give the previous study results a nod by using them as priors.

If so, the question comes to how. I am really thankful that I have your example for how I could use my data to create priors for future studies. Your example also helped me to understand brms output better.

In this case, however, I don’t have such output. In fact, all that is reported in the previous study is the age cat x sex means (in sq mi).

Is there an informal or approximate way to set a prior on the interaction term in my model to reflect that the previous study found girls spaces decreased in size by 58% with age, whereas boys spaces increased 108%? Again, I am not particularly swayed by this previous finding given some concerns I have about the methods, but lots of other folks see it as support for their hypothesis. So in some ways I feel like we are not starting from zero knowledge and that my model should reflect this.

1 Like

Looking back, there are a couple mistakes in my initial post which should be corrected. As you pointed out in your reply, I misunderstood parts of your example and ended up showing you how to make priors from your current data, which violated @HugoBotha’s first point: “a) Definitely shouldn’t base your prior on the results in your current study/data.” To be clear, I agree with HugoBotha without reservation. Second, I also misinterpreted the last part of the lme4 output. Though those are correlations among the fixed effects, I spoke of them as if they were correlations among random effects. As such, all my talk about lkj prior for correlations is mute in this example. The model for your current data has only one varying parameter, the intercept, and thus would have no level-2 correlation to fit.

I may need to be more careful about how late into the evening I offer opinions on the Stan Forums. sigh

As to your restated question, I think your options are limited. The basic problem is you’re trying to insert information from a single-level study into a multilevel study. In retrospect, I suspect you saw this as the major problem from the outset. If it were me, I’d just use the information to inform the general scale I’d expect to see in the data. Across groups, the square miles range from 2.63 7.91. Therefore I’d feel comfortable using something like prior(normal(4.5, 3), class = Intercept) for the level-1 intercept, given you stick with an unstandardized version of the data. Other than that, I’d be very weary of extrapolating from the four cell means you shared to your multilevel framework. There might be sophisticated ways to do so, but those would be well past my skill set.


Thanks for the clarifications, @Solomon. You answered my main question about incorporating results from the previous study. I appreciate the help!

@Eric_Green I thought I’d mention the fact that since you are doing an analysis in which the outcome variable is an area that is being used and occupied by your research participants, that outcome variable must always be positive. However, the Gaussian likelihood that you have chosen in your model is generating estimates and predictions that are implausible, i.e., that result in negative activity space areas. There are a few ways to deal with this, but one to consider would be to use a Gamma family as the likelihood in your model. Another question I have (sorry if this goes beyond the scope of your question) is whether all participants were observed for the same number of days, and whether those days equally straddle weekdays / weekends. Those are covariates to consider of course. Cool stuff!


@Brian_Wood thanks for taking the time to review and make this suggestion. I had not considered the non-zero issue!

Would the following make sense, or would you consider a different link?

m2 <- brm(as_areaK2 ~ y_male*y_age_c + (1|school), data=dat,

Dumb question, but the model results are logs and need to be exponentiated to get back to the raw scale of sq km, right? EDIT: Actually @Brian_Wood I think I misunderstood here. Reading a bit more, I’m wondering if it’s true that the coefficients from this model are multiplicative rather than additive. In other words, exp(-0.31)=0.73 indicates that the area decreases by a factor of 0.73 for a 1 unit increase in y_male, which is to say the area for boys is smaller by this factor (holding constant age and the interaction). Is that right? If so, is there a different approach that would let me talk about the slope as an increase/decrease in sqkm while still avoiding the negative area issue?

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.97      0.26     0.45     1.49 1.00     1159     1767
y_male            -0.31      0.16    -0.64     0.01 1.00     4161     2939
y_age_c            0.15      0.07     0.02     0.29 1.00     3255     3119
y_male:y_age_c    -0.06      0.10    -0.25     0.14 1.00     3217     2789


dat_factor <-
    m2$data %>% 
    mutate(y_male = factor(y_male)) 
  m2_factor <- update(m2, newdata = dat_factor)
  m2.p <- plot(marginal_effects(m2_factor))
  m2.p$`y_age_c:y_male` + theme_minimal()

To answer your other question, the data come from one reference day. We had folks locate their homes on high res satellite imagery (no home address system in this setting) and complete an georeferenced activity log. We turned that log of coordinates/activities into a SD ellipse representing the activity space. The area numbers in this example represent the size of each person’s ellipse.

1 Like

Not really - if your model inolves such linear decreases it usually also allows negative values. In my experience however most values that are only positive exhibit multiplicative rather than positive effects. Also if I recollect correctly, if the effects are small the differences between multiplicative and additive may become small.

You could get something closer to linear model but still enforce positivity by using something like a Softplus link, a.k.a log1p_exp, but I don’t have any experience with it to vouch it would not give you some headaches.

The safest way is obviously to test both forms of the model and if they agree qualitatively you don’t need to worry so much which one is more correct (search for “multiverse analysis” for more thoughts of this).

1 Like