Upper bound for individual brms priors

I’m trying to set the upper bound to zero for some - but not all - fixed effects parameters in a brms glmm. I’ve seen the work around described for constraining the intercept here, but I can’t figure out how to get that working for other parameters. I tried following that approach and modified the $model text to try to constrain the second coefficient to match what is produced using “ub = 0” in set_prior:

transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += normal_lpdf(b[1] | 0,10);
  lprior += normal_lpdf(b[2] | 0,10)
    - 2 * normal_lcdf(0 | 0,10);
  lprior += student_t_lpdf(Intercept | 3, 12.5, 2.5);
  lprior += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}

Refitting the model and updating the model object didn’t seem to implement the constraint.

I’ve seen a work around for nonlinear models here, but I need to stay within a glmm framework. That approach is also described here in a feature request for individual parameter constraints in brms.

Any help/ideas would be greatly appreciated. Thanks!

Ah, I see that there needs to be corresponding edit to the parameters section so the stan code to include <upper=0>. That seems to make sense for individual parameters, but it’s not clear how to do that when the population-level coefficients are defined as a vector, e.g. vector[Kc] b.

Based on the stan user guide, it seems that perhaps a vector of bounds can be applied? It’s unclear to me how to incorporate that into a hacked brms model…

Have you tried vector<upper=0>[Kc] b; ?

Thanks for the suggestion. That sets the upper bound for all coefficients (just implemented and confirmed), whereas I only want to constrain one coefficient. I may look more into the nonlinear work around, but I have some concerns about that working correctly when I eventually apply it to a much more complicated model.

Ah, I misunderstood your question. And you tried this from the Stan manual as you link? 24.4 Vectors with varying bounds | Stan User’s Guide

data {
  int Kc;
  vector[Kc] U;  // upper bounds
  // ...
}
parameters {
  vector<upper=U>[Kc] b;
  // ...
}

The other thing you can always do if you don’t have a large number of coefficients is to not use a vector and just write them out as individual parameters.

I haven’t been able to get that working. Here’s what I’ve tried.:

  1. Set up empty model object
mod <- brm(Y ~ X + A, data = df,
                  prior = c(set_prior("normal(0,10)", class = "b", coef = "X"),
                            set_prior("normal(0,10)", class = "b", coef = "A")),
                  empty = TRUE)
  1. Edit stan model
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  vector[Kc] U;  // upper bounds - MODIFIED LINE
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector<upper=U>[Kc] b;  // population-level effects - MODIFIED LINE
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += normal_lpdf(b[1] | 0,10); 
  lprior += normal_lpdf(b[2] | 0,10)
    - 2 * normal_lcdf(0 | 0,10); //  - MODIFIED LINE
  lprior += student_t_lpdf(Intercept | 3, 12.5, 2.5);
  lprior += student_t_lpdf(sigma | 3, 0, 2.5)
  - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}
  1. Define $model:
mod$model <- read_file("stan.txt")
  1. Make stan data and include variable U:
standata <- make_standata(Y ~ X + A, data = df,
                          stanvars = stanvar(c(1e6, 0), "U"))
  1. Fit model and update:
mod$fit <- stan(model_code = mod$model, data = standata)
mod <- update(mod, recompile = FALSE)

It gives the following error indicating that it doesn’t like that U is a vector:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Expression denoting real required; found type=vector.
 error in 'model21e14f69601_b8696f66e7ac5de8a19e6cddb1c80098' at line 22, column 15
  -------------------------------------------------
    20: }
    21: parameters {
    22:   vector<upper=U>[Kc] b;  // *MODIFIED* population-level effects
                      ^
    23:   real Intercept;  // temporary intercept for centered predictors
  -------------------------------------------------

And sorry if this all looks pretty sloppy. I’m rather unfamiliar with all things stan and C. I had been fitting models in MCMCglmm, but brms seems more promising for constraining parameters.

This should be in the data block. You need to put this in the Stan data list to pass to the model.

Thanks! I just tried that (needed to use K -1, as Kc is specified lower down), but it still gave them same error about not expecting a vector.

However, it looks like I managed to get it working using the nonlinear approach that specifies variables and coefficients more explicitly:

mod_nl <- brm(
  bf(Y ~ int + x*X + a*A,
     # int + x + a ~ 1,
     int ~ 1,
     x ~ 1,
     a ~ 1,
     nl = TRUE),
  family = gaussian(),
  data = df,
  prior = c(
    prior(normal(0, 10), class = "b", nlpar = "int"),
    prior(normal(0, 10), class = "b", nlpar = "x"),
    prior(normal(0, 10), class = "b", nlpar = "a", ub = 0.2)
  )
)

I think this might be the best approach? I’ll try it with my more complicated model using actual data and see if I can still keep things working.

Thanks a ton!

Probably so if you want to use brms. If you want to use Stan and have so few predictors, then you can just write them out individually in the Stan model instead of using a vector of coefficients.

Yeah, I got the same error as you with a simple example. I had to manually construct the transform to get it to work. Not sure why…

library(rstan)
library(brms)


N <- 1000
a <- 2
x1 <- rnorm(N, 0, 1)
x2 <- rnorm(N, 0, 1)
x3 <- rnorm(N, 0, 1)
mu <- a + 1.5*x1 + 1*x2 + 4*x3
y <- rnorm(N, mu, 1)

d <- cbind.data.frame(y = y, N = N, x1 = x1, x2 = x2, x3 = x3)

standata <- make_standata(y ~ 0 + Intercept + x1 + x2 + x3, data=d)
standata$U <- c(20,30,10,30)
standata$L <- c(-20,-30,-10,-30)


#This gives SYNTAX ERROR "Expression denoting real required; found type=vector."
stan_code <- "
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  vector[K] U;  // vector of upper bounds
}
parameters {
  vector<lower=L, upper=U>[K] b;  // population-level effects
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(sigma | 3, 0, 4.5)
    - 1 * student_t_lccdf(0 | 3, 0, 4.5);
}
model {
  // likelihood including constants
    target += normal_id_glm_lpdf(Y | X, 0, b, sigma);
  // priors including constants
  target += lprior;
}
"

#fit model
m1 <- stan(model_code = stan_code, data = standata,
             chains = 1, iter = 2000, warmup = 1000, 
             thin = 1, cores = 4)


#This appears to work
stan_code2 <- "
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  vector[K] U;  // vector of upper bounds
  vector[K] L;  // vector of lower bounds
}
parameters {
  vector<lower=0, upper=1>[K] b_raw;  
  real<lower=0> sigma;  // dispersion parameter
}
transformed parameters {
  vector[K] b = L + (U - L) .* b_raw;
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(sigma | 3, 0, 4.5)
    - 1 * student_t_lccdf(0 | 3, 0, 4.5);
}
model {
  // likelihood including constants
    target += normal_id_glm_lpdf(Y | X, 0, b, sigma);
  // priors including constants
  target += lprior;
}
"

#fit model
m2 <- stan(model_code = stan_code2, data = standata,
             chains = 1, iter = 2000, warmup = 1000, 
             thin = 1, cores = 4)

You might find this blog post useful:

It discusses exactly how to do this (providing several options).

1 Like

Thanks - helpful article!

I feel pretty good going with the nonlinear brms approach for now. I fit my full model and found that it essentially perfectly matches the same model fit with MCMCglmm. I was then able to constrain two of the parameters to be <= 0 using ub = 0 and everything seems to be working as expected.

The specific model is a bernoulli/logit plant survival glmm based on size, 3 environmental variables (E1,E2,E3) included as 2nd order polynomials, and 2 measures of plant density (D1,D2) with coefficients constrained to be <= 0.

I may make the priors wider than 10sd; still testing things out but it all looks good.

mod <- brm(
  bf(Survival~ int + b1*Size +
       b2*E1 + b3*E1^2 +
       b4*E2 + b5*E2^2 +
       b6*E3 + b7*E3^2 +
       b8*log1p(D1) +
       b9*log1p(D2),
     int ~ 1 + (1 | Location) + (1 | Year.Loc),
     b1 + b2 + b3 + b4 + b5 + b6 + b7 + b8 + b9 ~ 1,
     nl = TRUE),
  family = bernoulli(link = "logit"),
  data = data,
  chains = 1,
  prior = c(
    prior(normal(0, 10), class = "b", nlpar = "int"),
    prior(normal(0, 10), class = "b", nlpar = "b1"),
    prior(normal(0, 10), class = "b", nlpar = "b2"),
    prior(normal(0, 10), class = "b", nlpar = "b3"),
    prior(normal(0, 10), class = "b", nlpar = "b4"),
    prior(normal(0, 10), class = "b", nlpar = "b5"),
    prior(normal(0, 10), class = "b", nlpar = "b6"),
    prior(normal(0, 10), class = "b", nlpar = "b7"),
    prior(normal(0, 10), class = "b", nlpar = "b8", ub = 0),
    prior(normal(0, 10), class = "b", nlpar = "b9", ub = 0)
  )
)