Random walk or autoregressive prior in brms

I have a categorical age variable. I’d like to add a random effect for age category, but have those random effects change smoothly with the (ordered) age category, with a random walk or autoregressive prior. Is there an easy way to do this using brms?

The closest I could think of is to use mgcv-style syntax, i.e. Y ~ s(age_cat), but I would like there to be a separate parameter for every age-level.

Any suggestions?

After stewing on this for a couple days, my best solution is to sort of “brute force” it using a combination of priors/stanvars to add the code that I need. Here’s an outline of my half-baked solution:

  1. Set the model up with random effects for each agecat, i.e. Y ~ (1|agecat)
  2. “Remove” the prior on the random effects. I think the best way to do this is just to make the default prior for for the sd really flat, like normal(0,100000).
  3. Add code for the random walk to the transformed parameters block, using stanvars.
  4. Add a soft sum-to-zero constraint across the random effects, using stanvars, to ensure identifiability (there’s still an intercept in there).

It’s a very ugly solution, so if someone has a better one it would be much appreciated.

Just checking again if anyone has an easier solution? I was hoping there was some sort of built-in functionality in brms that would allow this. @paul.buerkner any ideas?

I don’t think there is a much better way with brms than to what you do currenty. Perhaps except for the fact that I would not set a wide prior on the SD, but rather fix it using something like prior(constant(10000), class = "sd", ...). I don’t fully understand yet why you want to remove the prior on the random effects but given that this is your call, setting the SD to a very large constant value will give you just that.

Would it make more sense to treat it as a monotonic population effect, and just use the prior to regularize like you normally would see in a random effect?

You could also just code this explicitly as a random walk, which is by definition, I think, a function over discrete, but correlated, categories, with varying degrees of “lag.” In the INLA world this would be something like f(age_cat, model='rw2') for a second-order random walk, which is somewhat equivalent to the mgcv default spline; first-order random walk or AR1 relationship can be done also.

In Stan the formula might need to be coded more explicitly. There have been some posts demonstrating what a random walk might look like, I believe.

Thanks for the response. I was hoping there would be something easier, but I guess not.

Yes, and the constant prior for the SD would be preferred. The reason I want to remove the prior on the random effects is that I’m going to be putting another prior on top of it (manually, as part of step 3 above). This prior will be placed on the differences between the adjacent coefficients.

@bachlaw I don’t want to treat the random effects as monotonic, just smooth. Also I’m looking for a solution using brms… I know how to do this using regular Stan code.

@paul.buerkner - here is an update, and unfortunately I am a bit stuck.

I have successfully implemented the random walk prior in Stan (not brms). To do this, I have defined this rw1() function, curtesy of @Bob_Carpenter -

  real rw1_lpdf(vector alpha, real sigma) {
    int N = cols(alpha);
    return normal_lpdf(alpha[2:N] - alpha[1:N - 1] | 0, sigma);
  }

When I set the distribution of the random effects, instead of setting them to be the usual normal, I set them to be rw1. I also add a sum-to-zero constraint.

The problem in trying to implement this in brms is that for the formula Y ~ (1|agecat), brms automatically sets the random effects as normally distributed, and I can’t figure out how to remove that code. I tried doing what you suggested and setting the standard deviation of that normal distribution to be constant(10000), and then added the rw1 distribution on top of it, but it caused all sorts of sampling problems.

Do you have any thoughts on if there is a way to “remove” the existing normal distribution? Do you know why just setting that normal distribution to be super-flat (large sd) and then adding the second rw1 distribution on top didn’t work? I saw that you can use gr() to replace the normal with a t-distribution for the random effects, but it doesn’t allow any other distribution such as rw1.

Here’s some simulated data and my best attempt at the brms code, if you want to give it a shot:

# Simulate Data
N <- 100
d <- tibble(agecat = sample.int(8, N, replace = TRUE)) %>%
  mutate(mu = sin(pi*agecat/7),
         Y = mu + rnorm(N, sd = 1))

# Set up stanvars and priors
sv <- stanvar(scode = 'real rw1_lpdf(vector alpha, real sigma) {
    int N = num_elements(alpha);
    return normal_lpdf(alpha[2:N] - alpha[1:N - 1] | 0, sigma);
  }',
              block = "functions") +
  stanvar(scode = 'r_1_1 ~ rw1(sd_agecat_rw);', block = "model", position = "end") +
  stanvar(scode = "real<lower=0> sd_agecat_rw;", block = "parameters", position = "end") +
  stanvar(scode = "sd_agecat_rw ~ normal(0,1);", block = "model", position = "end") +
  stanvar(scode = "sum(r_1_1) ~ normal(0, N_1*.001);", block = "model", position = "end")

pr <- prior(constant(10000), class = "sd", group = "agecat")

# Fit model
fit <- brm(Y ~ (1|agecat), data = d, prior = pr, stanvars = sv)

I would recommend generating the Stan code via make_stancode, edit as you need (i.e., remove the not needed independent normal prior), fit the model with Stan directly and put all of it back into brms. See the empty = TRUE argument of brm() and corresponding example for details.

1 Like

Great solution, thanks.

Hi!

As I ran into the problem of setting up an RW prior in brms. I figured an alternative to the above is possible more straightforward?

Here is what I came up with below. This is what I will use as basis for my current project for the moment being (so things may change). Note that you have to scroll to the bottom of the snippet to see my alternative approach. The key idea is to setup a non-linear formula and split out the RW prior into its linear formula. I also choose to declare a reference age category which becomes the overall intercept of the model. This way I do not need to deal with sum to zero constraints - but be aware of the fact that this then defines the overall intercept such that one must not define an intercept in the linear predictor term.

Ping @jgellar and @paul.buerkner as FYI.

here::i_am("rw_model.R")
library(here)

library(brms)
library(dplyr)
library(tidyr)
library(posterior)

# instruct brms to use cmdstanr as backend and cache all Stan binaries
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=here("_brms-cache"))

# create cache directory if not yet available
dir.create(here("_brms-cache"), FALSE)

# set the R random seed of the session for reproducibility
set.seed(5886935)

## PREVIOUS APPROACH

# Simulate Data
N <- 100
d <- tibble(agecat = sample.int(8, N, replace = TRUE)) %>%
  mutate(mu = sin(pi*agecat/7), x=rnorm(N), 
         Y = mu + 2 * x + rnorm(N, sd = 1),
         )

# Set up stanvars and priors
sv <- stanvar(scode = "real rw1_lpdf(vector alpha, real sigma) {
    int N = num_elements(alpha);
    return normal_lpdf(alpha[2:N] - alpha[1:N - 1] | 0, sigma);
  }",
              block = "functions") +
  stanvar(scode = "r_1_1 ~ rw1(sd_agecat_rw);", block = "model", position = "end") +
  stanvar(scode = "real<lower=0> sd_agecat_rw;", block = "parameters", position = "end") +
  stanvar(scode = "sd_agecat_rw ~ normal(0,1);", block = "model", position = "end") +
  stanvar(scode = "sum(r_1_1) ~ normal(0, N_1*.001);", block = "model", position = "end")

pr <- prior(constant(10000), class = "sd", group = "agecat") +
    prior(normal(0, 5), class=Intercept) +
    prior(normal(0, 2), class=b)

# Fit model
fit <- brm(Y ~ 1 + x + (1|agecat), data = d, prior = pr, stanvars = sv, chains=1)


## ALTERNATIVE APPROACH

## Defines an RW prior for brms. This requires to define in brms a
## non-linear model such that the RW term is it's own linear model
## which must include the overall intercept (so set the overall
## intercept to 0 for the remaining linear model). The user can pick
## which of the categories for the random walk is used as overall
## intercept by setting the idx_inter to the respective index. The
## function returns a list with the elements prior and stanvar, which
## need to be passed to the brm call respectivley.
prior_rw <- function(idx_inter, sigma_inter, prior_scale_sigma_rw, ...) {
    pr <- prior_string(paste0("rw1(", idx_inter, ", ", sigma_inter, ", sigma_rw)"), ...) +
        prior_string(paste0("target += normal_lpdf(sigma_rw| 0, ", prior_scale_sigma_rw, ")"), check=FALSE)
    sv_rw1 <- stanvar(scode = "real rw1_lpdf(vector alpha, int idx_inter, real sigma_inter, real sigma_rw) {
    int N = num_elements(alpha);
    return normal_lpdf(alpha[2:N] - alpha[1:N - 1] | 0, sigma_rw) + normal_lpdf(alpha[idx_inter]| 0, sigma_inter);
  }", block = "functions")
    sv_sigma_rw <- stanvar(scode="real<lower=0> sigma_rw;", block="parameters")
    list(prior=pr, stanvar=sv_rw1 + sv_sigma_rw)
}

age_rw_prior <- prior_rw(1, 5, 0.5, class="b", nlpar="rw")

alt_model <- bf(Y ~ lin + rw, lin ~ 0 + x, rw ~ 1 + agecatF, nl=TRUE, center=FALSE)

d2 <- mutate(d, agecatF=factor(agecat))

get_prior(alt_model, d2)

alt_prior <- prior(normal(0, 2), class=b, coef=x, nlpar=lin) +
    age_rw_prior$prior

alt_fit <- brm(alt_model, data = d2,
               prior = alt_prior,
               stanvars = age_rw_prior$stanvar,
               refresh=250,
               empty=FALSE, init=0.2) ##, sample_prior="only")

alt_fit

model_pred <- tibble(agecatF=levels(d2$agecatF), x=0)

rvar(posterior_epred(alt_fit, newdata=model_pred))
1 Like

Thank you! Very cool!