BRMS: Can I fit only random effects for a subset of data, and not have this data impact the remaining parameter estimates?

(intro copied from question on cross validated).

I have fitted a Bayesian multilevel model. Example:

Y \sim Normal(\mu, \sigma)\\ \mu = \alpha + \beta * time + z_{subject}\\ z_{subject} \sim Normal(\bar{z}, \sigma_z)

I have fitted this to some data to get a posterior distribution for \sigma, \alpha, \beta, \bar{z} and \sigma_z

I now want to make a conditional prediction: If i observe a new Y at time=1 in a subject not related to my original data. Can I make predictions for observations in the same subject at time=2?

Attempt

I tried to bind “fake” observations for a new subjects at time=1 to my dataset, fit the model and predict Y for time=2 for these subjects. This seems to work, but adding the fake observations not only gives an estimate of Z for these subjects, it also impacts the remaining parameters. I only want the fake data to impact the estimate Z for the specific subject.

I tried to use Y | weight(real) ~ 1 + time + (1|subject) where real data had a weight of 1 and fake a weight of 0, but then Z was not even estimated for the fake data.

Is this possible and/or is there a better approach for this kind of inference?

A “minimal” reproducible example

First generate some sample data, d.

library(tidyverse)
library(brms)
library(patchwork)

set.seed(1)
d <- 
  expand_grid(
    id = factor(1:10),
    time = 1:4
  ) %>% 
  mutate(
    z = rep(rnorm(10, sd = 0.5), each = 4),
    Y = 1.5 + 0.5 * time + z + rnorm(40, sd = 0.2),
  )

ggplot(d, aes(time, Y, color = id)) +
  geom_line() +
  geom_point()

Model the “real” data (M1)

Fit the model to the data with brm and show fixed and random estimates.

M1 <- brm(Y ~ 1 + time + (1 | id),
           data = d, family = gaussian(),
           iter = 1000)

plot_posterior_draws <- function(M) {
  
  rand <- gather_draws(M, r_id[id,]) %>% 
    mutate(id = factor(id)) %>% 
    ggplot(aes(x = .value, y=id)) +
    stat_halfeye() +
    labs(title="Random")

  fix <- gather_draws(M, b_Intercept, b_time) %>% 
    ggplot(aes(x = .value, y=.variable)) +
    stat_halfeye() +
    labs(title="Fixed")

  rand / fix + plot_layout(heights = c(3,1))
}

plot_posterior_draws(M1)

Estimate random effects for hypothetical observations (M2)

I want to estimate random effects for hypothetical observations where I only have one observation in an individual.

First, generate hypothetical observations and give each a unique id (above 100) to distinguish it from real data.

hypothetical_data_for_predictions <- 
  expand_grid(time = 1:4, Y = 1:5) %>% 
  mutate(id = factor(row_number() + 100),
         weight = 0)

Then, add this hypothetical data to the real data and fit the model again.

d_w_hypothetical <- 
  d %>% 
  mutate(weight = 1) %>% 
  bind_rows(hypothetical_data_for_predictions)

M2 <- brm(Y ~ 1 + time + (1 | id),
          data = d_w_hypothetical, family = gaussian(),
          iter = 1000)

plot_posterior_draws(M2)

Here we get estimates for the random effect for the hypothetical data (id 100:120), but adding the hypothetical data of course changes the rest of the model as well.

Attempt to avoid that hypothetical data changes the model (M3)

I attempted to use weights (1 for real data and 0 for hypothetical data) to avoid having the hypothetical data impact the model fit, but now random effects aren’t estimated either.


M3 <- brm(Y | weights(weight) ~ 1 + time + (1 | id),
          data = d_w_hypothetical, family = gaussian(),
          iter = 1000)

plot_posterior_draws(M3)

I also tried giving the hypothetical data a weight of 1e-6, but that gives a similar result to M3.

Intuitively, I would think that only the relative weights within a group should impact the random effect estimate for that group.

2 Likes

If I fit the model with the real data and only one hypothetical observation, the posterior would be conditional on observing the hypothetical event (which is perfect). However, that solution would require fitting nearly the same model a lot (hundreds) of times.

I think this is the heart of the problem - you don’t want to change your estimates of all the parameters following observations of new data, but that’s kind of against a lot of the spirit and theory of Bayesian inference. Imagine you collect new data, that tell you e.g. that \beta is larger than you expected previously - on what grounds do you find it reasonable to ignore this information? Understanding why do you want this to happen might help in figuring out if there is a better solution.

If just having a baseline model that doesn’t change over time is important, a direct and simple solution is to fit the model without any new observations and keep it. Then for each new observation at time 1, you refit the model with the original dataset + this one new row.

EDIT: It appears this is what you were already considering at:

If this is really what you need, and you are willing to do a bit of a deeper dive, I would guess that you could use something like Pareto-smoothed importance sampling to fit the model once and then use that to cheaply build approximate fits with one extra observation at a time. ( The reason I guess this should work is that the loo package uses PSIS to estimate fits with one observation left out without refitting, so it sounds plausible it would also work to get you estimate fits with one more observation - see [1507.04544] Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC). PSIS is implemented at Pareto smoothed importance sampling (PSIS) — psis • loo

It might also be workable to take (a subset of) the samples you have for the initial dataset and for each sample fit a model where all coefficients are fixed to their values in the given sample and the only parameter to be estimated is the new z_i. If your likelihood is normal, the posterior for this restricted model will have analytical solution. You then take one sample of the new z_i for each of those fits to construct a posterior of just z_i assuming the distribtuion of other parameters didn’t change.

This unfortunately can’t work directly: integer weight of k in brms work the same way as having k copies of the same observation in the dataset. Specifically, k=0 means that the model behaves as if the corresponding row was not present in the dataset.

The same reasoning as above applies: not only relative, but also absolute weights are important - the larger the weights, the tighter the resulting posterior.

Hope that helps at least a little.

1 Like

Thank you for the answer! I see how calculating loo without refitting an entire model is quite similar to my problem. I’m quite new to MCMC sampling, so implementing PSIS to solve the problem may be too big of a mouthful.

I’m not entirely sure what I need. In the end, I just want to make predictions that are conditional on a single hypothetical observation in a subject outside my original data. E.g, given one observation in a new subject, what is the probability that the next observation in that individual is above a given value?

That makes the problem quite obvious. Is it an unreasonable idea that weight could be level-specific in a multi level model? or is it just not implemented in brms? (edit: not that important for this question anymore. You convinced me that I should of course update my posterior based on each single hypothetical observation).

1 Like

A Wild guess: maybe you are uneasy about fitting with all the new and old data together, because you believe that the underlying process differs between the new and old data. If that’s the case than maybe the best solution would be to explicitly model how the data generating process changes. A similar problem was discussed at Bayesian parallels of weighted regression (but the aim was to put more weight to recent observations)

I think this is what I do in M2. The problem is that “new data” does not exist. It is simply a grid of predictors and outcomes. Some combinations are very unlikely. My predictions should only be conditional on a single hypothetical observation.

Another way to think about it is that I want to use the posterior from the model fit on “old data” as the prior in a model that only has a single (hypothetical) observation.

I had something a bit different in mind: if I create a new data column called batch which will be 1 for all the original data and then get new values for each new data point, then I can use formula like Y ~ 1 + time + ( 1 + time | batch) + (1 | subject). This allows the estimates for time and intercept to deviate from the old data and thus the estimates for the original data will not be affected much by the new data. I am not sure this is a good model for your data, just trying explain a possibility.

Unfortunately, the best way to implement this is to actually fit the model to the old data + the new observations. There is no reliable method to convert a set of posterior samples into a prior for a new model without losing information.

Ahh. I can not really comprehend the consequence of that model. Estimates for my “real” data do seem practically unchanged, but estimates for the hypothetical data are a lot wider than if I fit the model with only “real” data and one hypothetical observation.

The best solution right now seems to be fitting a lot of very similar models. My model i rather small, and only takes about a minute to sample 4x2000 iters, so I may just let my CPU suffer and fit 200 very similar models.

1 Like