How to add extra prior to brms


I want to add an additional prior to my brms model. So for example if this is my model:

fit1 <- brm(out_fvc ~ days_from_baseline * diagnosis + gender +
    cohort + (days_from_baseline | uin), data = df2, chains=4, iter = 100)

I want to add a prior that constrains the sum of the population + random intercept, or the population + random slope. I know I can do things like this with stanvars → for example:

svars <- stanvar(scode = "sd_1 + Intercept ~ normal(0, 0.1);\n",
    block = "model")

I think this constrains the sum of the population + random intercept - but my question is which variables in the stancode correspond to the population + random slope ? Looking at the code below I think the population slope (i.e days_from_baseline) is represented by so element of b, but I’m not sure which variable represents the random slope specified by (days_from_baseline | uin) ? Can any decipher the brms code in this respect? Many thanks.

// generated with brms 2.16.3
functions {
 /* compute correlated group-level effects
  * Args: 
  *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
  *   matrix of scaled group-level effects
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  int<lower=1> NC_1;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
    target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 3.1, 2.5);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 2 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(to_vector(z_1));
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
1 Like

To put this question in another way:

I want to specify a prior of the form:

pop_intercept ~ normal( 3 - rand_intercept, 1 );
rand_intercept  ~ normal( 0, 0.5);

How can I do this in brms? (I know this is unusual prior specification, I want to explore the prior predictive checks etc)

I think there is a potential misunderstanding: the random intercepts/slopes already have their hierarchical priors set. They are represented internally via the z_1 parameters (which are not scaled and have a normal(0,1) prior), but the actual intercepts/slopes are stored in r_1. The manipulation of the correlation matrix makes the effective prior something like r_1 ~ multi_normal_cholesky(0, sd_1 * L_1) (I didn’t check this is exactly correct, but the important point is that there is an implied multivariate normal prior on r_1 with zero mean vector and where the covariance matrix is fitted).

So if you want to constrain the sum via a stanvar, you need to work with the r_1 matrix. However, this will always mean you put two priors on r_1 at once and this is quite likely to have unintended consequences.

The sd_1 vector gives the standard deviations of the group terms (which form the covariance matrix together with the correlations) and I don’t think you can give a very natural interpretation to quantities like sd_1 + Intercept

This is IMHO incoherent, because pop_intercept is a single number, but rand_intercept is a vector. So I presume I don’t understand what exactly should the prior do and I think there might be some conceptual confusion.

Usually best way to make progress in such problems is to try to write R/Python/… code to simulate the data you expect to see - this should force you to be explicit about how are the random and population intercepts tied and we can then discuss how to mimic that in brms.

1 Like

Thanks Martin!

I know I’m trying to override the default - not a misunderstanding, I’m doing this on purpose to test some ideas :)

I have experimented in base Stan code, but I was having trouble porting that to brms. The base Stan code is:

data {
  int<lower=0> N;      // Number of observations
  vector[N] x;         // Predictors
  vector[N] y;         // Observations

  // Number of individual contexts in hierarchy
  int<lower=0> K;
  // Individual context from which each observation is generated
  int<lower=1, upper=K> indiv_idx[N]; 

parameters {
  real alpha; // global intercept
  real beta; // global intercept

  vector[K] z1; // group intercept
  vector[K] z2; // group slope

  real<lower=0> sigma; // constrained to be positive

model {
  vector[N] mu; // declaring a mu vector
  // priors
  z1 ~ normal(0, 0.75);
  alpha ~ normal( 3 - z1, 1 );
  z2 ~ normal(0, 0.75);
  beta ~ normal(0 - z2, 0.1);
  sigma ~ normal(0, 1); 
  // likelihood
  //for(i in 1:N) {
  //  mu[i] = alpha + z1[indiv_idx[i]] + beta*x[i] + z2[indiv_idx[i]]*x[i];
  //y ~ normal(mu, sigma);


This was fine for trying a few things, but I needed to play in brms next as I’m not good at doing ppc’s and such from base stan.

Brilliant thanks! I’ll try that. Don’t worry these are very much intended consequences!

Actually, I’ve come around to thinking the standard way of setting priors for ‘linear mixed models’ is incoherent. Why? Well I’m still thinking that out and playing with these prior specifications is helping me to think about it!

1 Like

The problem with expressing your ideas in a Stan model is that it lets you express things that don’t correspond to any simulator, i.e. to any generative process, that’s why it’s helpful to try to write the code as pure simulation (either in Stan’s generated quantities style or in R).

Note that in the code you shared, z1 and z2 don’t correspond to “random” intercept/slope - they are just another “fixed” effect/slope. If they were to be “random”, you would have something like:

 z1 ~ normal(0, sd1);
sd1 ~ normal(0, 1);

I.e. there is a hierarchical prior.

Also, in the Stan code, you still have the problem of using 3 - z1 (a vector) as the prior mean for alpha (a single number), and I have no idea how that could be made to work. Maybe you meant something like alpha ~ normal( 3 - mean(z1), 1 ); ? (still not sure how that would work, but is at least a valid statement).

1 Like