Help understanding prior simulation in brms - dummy coding not giving expected result

sjp · May 25, 2023, 12:55pm

Okay, so I was directed to the explanation for this by @tjmahr and given some code for setting the prior from @Solomon. Thanks so much for taking a look and helping out!

As is clearly stated in the docs, but I somehow missed, setting a prior on class = Intercept in brms is not setting a prior on the actual intercept of the linear model, but rather the temporary intercept of the centered design matrix. This is explained here under the heading “Parameterization of the population-level intercept.”

Using the code below results in the expected behaviour:

prior_check2 <- brm(y ~ 0 + Intercept + x,
                   data = df,
                   prior = c(
                     prior(normal(8,3), class = "b", coef = "Intercept"),
                     prior(normal(0,2), class = "b", coef = "x1"),
                     prior(normal(0,3), class = "sigma")),
                   sample_prior = "only")
plot(conditional_effects(prior_check2))

example2

Doing my best to understand the nuances of why placing the prior on the temporary intercept results in the behaviour above, and I’ll update if/when that happens.

EDIT: This is a further explanation of why this happens. I’m hoping that it may be useful to others in the future. I think it was certainly worth the time I took to figure it out.

TL:DR - The temporary intercept you are fitting priors to when you use class = Intercept in my example is actually a prior on the grand mean of y, like it would be if you used sum-coding (-0.5, 0.5) for the binary factor (because you are). This is only for my example, in which the data was balanced. Unbalanced counts of levels of a categorical predictor would lead to slightly different behaviour.

This is the Stan code generated by brms for the prior_check model fit in the original post.

// generated with brms 2.18.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += normal_lpdf(b | 0, 2);
  lprior += normal_lpdf(Intercept | 8, 3);
  lprior += normal_lpdf(sigma | 0, 3)
    - 1 * normal_lccdf(0 | 0, 3);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

The important parts to remember here are in the transformed data block, where it specifies Xc which is a centered version of the design matrix X with one less column than the original design matrix. Then we transform some values inside the for loop, copied by itself below. As K=2 for us in this model, means_X is just a single number that is the mean of the second column in the design matrix, which is our factor X. The mean of that column is 0.5, because it contains an equal amount of 0s and 1s. The new design matrix Xc is now just a single column where we have taken the original column of 0s and 1s and subtracted 0.5 to turn it into a column of -0.5s and 0.5s, respectively.

for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }

for (i in 2:2) {
    means_X[1] = 0.5;
    Xc[, 1] = X[, 2] - 0.5;
  }

The line that fits the model passes the response variable Y, the centered design matrix Xc, a temporary Intercept, the population level effects b, and sigma. This means that the population-level estimate for the effect of x on y is done with the sum-coded version of my dummy-coded variable, and the Intercept estimated will be the grand mean between the two levels.

target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);

In the generated quantities block, we get back b_Intercept, which is the average for the first level of the factor. This is done by taking the temporary intercept, which is the grand mean, and

generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

Because my data was balanced, the temporary intercept here was the grand mean between the two categories. We then subtract half of the estimated effect to get back to the b_Intercept reported by the model, which is the average of y when x=0, i.e., the first level of the factor.

This means for this example, I could have treated setting a prior on class=Intercept as if I were setting a prior on the grand mean between the two levels. If I’m understanding this correctly, however, if my levels were unbalanced, then the temporary intercept would be closer to whichever had more observations.

Topic		Replies	Views
Simulating Data Using brms	6	3101	July 13, 2020
Prior predictive check - asymmetry in estimation of uncertainty? brms	4	487	March 9, 2020
Advice on model structure brms	3	677	October 30, 2019
Question re brms: conditional_effects General brms	4	1224	July 4, 2022
Bayes factor = 0.00 Modeling brms	0	338	January 20, 2023

Help understanding prior simulation in brms - dummy coding not giving expected result

Related topics