I am using brms to build a repeated event, multi-level survival model. There are about 30 observations for each individual in the data, so I am using a varying intercept for each individual. Additionally, there is too much data to estimate the model on the entire dataset, so I am going to estimate the model on a sample of 200,000 people. Once I have estimated the model, I need to score everyone in my dataset (~60 million) with their individual level varying intercept term. What is the best way to do this with brms?

If this helps, I have provided a survival model using the kidney dataset below. I withheld 37 observations to use as my out-of-sample population.

‘’’'library(brms)
data(“kidney”, package = “brms”)
head(kidney, n = 3)
dim(kidney)

#adding random intercepts over patients

fit1 <- brm(time | cens(censored) ~ age + sex + disease + (1|patient),
data = kidney[1:40], family = weibull(), inits = “0”,
prior = set_prior(“cauchy(0,2)”, class = “sd”))
summary(fit1)
plot(fit1)

In the email you mentioned a certain approach explained in BDA3. I am on the road and don’t have my copy of BDA3 with me right now. Can you explain what you had in mind?

Thanks for the reply. I was looking at Gelman and Hill (2007). On page 274 they outline how to get a predicted y for a new observation in a new group. Now that I have spent some time looking at the problem, I am not sure if it is the solution I was looking for. It looks like the critical line on that page is

Each simulation draw of \tilde{y} uses a different simulation of \tilde{\alpha} [individual-level varying-intercept], thus propagating the uncertainty about the new county into the uncertainty about the new house in this county.

While I do need the prediction, I also need to estimate the \tilde{\alpha} for all the other observations. That \tilde{\alpha} represents the frailty, or individual hazard. I assumed that in the process of prediction on a new observation in a new group that the \tilde{\alpha} would be estimated implicitly. I was hoping to estimate \alpha explicitly and either save the trace or the summary statistics.

If you are confused, this might be a fault of my understanding. I am relatively new to multilevel models in general, so I am trying to learn the theory while I build the model at the same time.

You may want to use the posterior_predict function with new data (that contains the new levels) and set allow_new_levels = TRUE to propagate the uncertainty inherent in the varying effects distribution.

Using the R code above I built a model and looked at the posterior_predict. it is 76 columns with 4000 rows. I assume that is the individual sample of \tilde{y} for each observation. Can you point me in the direction of a tutorial/text that shows how to extract the \tilde{\alpha} from those vectors?.

I found a reference in BDA 3rd Edition for what I am trying to do. On page 387 they are discussing extended their hierarchical linear model that predicts US Democrat’s share of the two party vote. I have provided the quote below. How do I get the predictive posterior distribution for the intercept term out of the posterior_predict method

However, we have no information on the coefficients of these predictor variables; they are not even included in the vector \beta that we have estimated from the data up to 1988. Since we have no data on these five new components of \beta, we must simulate their values from their posterior (predictive) distribution; that is, the coefficient for the year indicator is drawn as N(0,\tau^{2}_{\delta}), the non- South region x year coefficients are drawn as N(0,\tau^{2}_{\gamma1}), and the South x year coefficient is drawn as N(0,\tau^{2}_{\gamma2}), using the values \tau_{\delta}, \tau_{\gamma1}, \tau_{\gamma2} drawn from the posterior simulation.

You don’t. What you can do is to model the uncertainty contained in the random intercept distribution, which is basically what sampling for this distribution means.

What you cannot do is to get specific intercept estimates of those left out groups as they were not included in the original data.