A few questions about rstanarm's stan_jm() function

options(mc.cores = parallel::detectCores())
mod3 <- stan_jm(
                    formulaLong = list(
                                        sbp_me ~ group + bptime + (bptime | ID),
                                        dbp_me ~ group + bptime + (bptime | ID)
                    ),
                    formulaEvent = survival::Surv(newtime, dth) ~ group,
                    dataLong = BP,            # Longitudinal data 
                    dataEvent = surv, # Survival data
                    time_var = "bptime",      # Time variable in longitudinal data
                    chains = 4,             
                    iter = 4000,              # Number of iterations per chain
                    control = list(max_treedepth = 20, adapt_delta = 0.95), # Control options
                    
                    refresh = 100,           
                    #init = init_values,
                    seed = 12345              
)

Here’s our code for the joint model.
In the running process, I have a few questions:

  1. I set parallel running to accelarate the process. However, as I have set 4 chains in MCMC, only 2 chains proceed fast and successfully while the other 2 were stuck at the warm up period. How to solve the problem? Should I set inital values for the 4 chains and how to determine the appropariate value?
  2. In specific, as you mentioned earlier, what kind of starting point is “reasonable”? Will values around the mean value of baseline sbp or dbp help?
  3. We have 210,000 observations in the longitudinal data and 12000 observations in the survival data. How much can we reduce the warmup and iteration?

I’m not an expert in these joint longitudinal models, but here are some general comments:

Setting initial values may help but I’m really not sure what appropriate values would be for your specific situation. Based on the documentation for the init argument to stan_jm

The method for generating the initial values for the MCMC. The default is “prefit”, which uses those obtained from fitting separate longitudinal and time-to-event models prior to fitting the joint model. The separate longitudinal model is a (possibly multivariate) generalised linear mixed model estimated using variational bayes. This is achieved via the stan_mvmer function with algorithm = “meanfield”. The separate Cox model is estimated using coxph. This is achieved using the and time-to-event models prior to fitting the joint model. The separate models are estimated using the glmer and coxph functions. This should provide reasonable initial values which should aid the MCMC sampler. Parameters that cannot be obtained from fitting separate longitudinal and time-to-event models are initialised using the “random” method for stan. However it is recommended that any final analysis should ideally be performed with several MCMC chains each initiated from a different set of initial values; this can be obtained by setting init = “random”. In addition, other possibilities for specifying init are the same as those described for stan.

it looks like it tries to generate some useful inits for you automatically. If that’s not working well you could try switching to random inits or specifying them manually (but I’m not sure what good init values would be for your situation). If you switch to random inits (init="random") then I believe you can also pass the init_r argument to rstan via stan_jm’s ... argument. This can be used to control how dispersed the initial values are allowed to be. I think the default is 2 (i.e., generate values uniformly at random in [-2,2] on the unconstrained parameter space) so setting a smaller value (e.g. init_r = 0.1) may be helpful sometimes.

In terms of how many warmup and sampling iterations to use, it’s hard to say but you don’t necessarily need to start with the defaults if it’s taking a long time to run. There can be value in running shorter chains and examining the results so you get more immediate feedback. You can check diagnostics like effective sample size and r-hat to see if the shorter chains are sufficient.

Maybe someone who knows these models better (e.g. stan_jm developer @sambrilleman) will have some better ideas.