Issue when trying to fit a linear mixed effects model using Rstan

@Ara_Winter Thanks, I’ll check those links/fixes out on my Mac.

The version of R I’m using is, R version 3.6.1 and the rstan version is 2.19.1.

BTW, I’m started running this same code in a windows machine after seeing @mike-lawrence’s reply about priors.

Click the Refresh button to see progress of the chains
starting worker pid=1340 on localhost:11532 at 19:10:16.821
starting worker pid=10276 on localhost:11532 at 19:10:17.702
starting worker pid=6988 on localhost:11532 at 19:10:18.571
starting worker pid=2124 on localhost:11532 at 19:10:19.512

SAMPLING FOR MODEL 'init' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.063 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 630 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'init' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 5000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'init' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.081 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 810 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 5000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'init' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.11 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1100 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 4: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 2: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3: Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 4: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 4: Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 3: Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 4: Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 3: Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 3: Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 4: Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 4: Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 4: Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 4: Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 4: Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 4: Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 4: Iteration: 5000 / 5000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 1551.54 seconds (Warm-up)
Chain 4:                2991.6 seconds (Sampling)
Chain 4:                4543.14 seconds (Total)
Chain 4: 
Chain 2: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 3: Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 3: Iteration: 3500 / 5000 [ 70%]  (Sampling)

And the last four steps started yesterday and still running. Maybe there’s something wrong with my model…???

Hmmm there might be! Let me take a look at the model. What does you data look like? Or at least a simulation of your data if you can’t share the actually data.

Thanks to you and @bgoodri I managed to install Rcpp correctly. I don’t get the compilation error anymore.

Since I’m not at liberty to post my original dataset. So, a simulated dataset like mine simulated_data.csv (31.2 KB) is attached here. I also attached the rstan code for_sim_data.stan (2.0 KB) and r code for_sim_data.R (1.3 KB) I use to run the model. I still get divergent transitions when I run this!

Thanks for all your help! :)

2 Likes

Ok cool. Glad Rcpp is up and going. Have you run the pair plots? And do you have tidybayes install so we can look at the diagnostics plots.

Thanks for the sim data, code, and R script. I’ll pull those down and take a look.

So the brute force method is just to crank up the number of iterations. However this is not very elegant and may not solve the problem ;)

2 Likes

:) I have not used tidybayes before. But I installed the package now and will check it out! Thanks for the suggestion. I started running the model just now. I will upload the pairs() plot after it gets completed! Not sure it will though. Because I tried running 4 chains, 2000 iterations with initial values. 2 chains are already completed. one is 90% completed. But the last one is still 10% complete. That chain seems to lag/stuck! If I run this model without passing initial values it throws errors (like the one below) for the initial values rstan selects.

Rejecting initial value:
Chain 2:   Error evaluating the log probability at the initial value.
Chain 2: Exception: Exception: cholesky_decompose: Matrix m is not positive definite

The trace plots look like this!

This is the inference.

Inference for Stan model: init.
4 chains, each with iter=2000; warmup=1000; thin=10; 
post-warmup draws per chain=100, total post-warmup draws=400.

         mean  se_mean       sd    2.5%     25%     50%      75%    97.5% n_eff    Rhat
alpha[1]     0.47     0.02     0.03    0.42    0.46    0.47     0.47     0.50     2    7.05
alpha[2]    -0.12     0.03     0.04   -0.19   -0.17   -0.11    -0.09    -0.08     2    6.24
alpha[3]    -0.07     0.05     0.07   -0.19   -0.07   -0.04    -0.02    -0.02     2    9.60
alpha[4]     0.11     0.04     0.07    0.05    0.06    0.08     0.11     0.24     2    3.91
alpha[5]    -0.21     0.25     0.38   -0.97   -0.11   -0.02     0.03     0.03     2    3.12
alpha[6]    -0.02     0.01     0.03   -0.07   -0.04   -0.02     0.00     0.04     3    1.88
sigma    25165.72 26602.77 37716.41  925.58 1951.94 4724.39 27855.82 90313.34     2 2030.32
tau       1695.27   175.49   249.58 1245.50 1585.56 1745.62  1875.29  1947.82     2   15.06
kappa        0.38     0.25     0.36    0.00    0.18    0.29     0.41     1.00     2    9.08
H            0.03     0.03     0.05    0.00    0.00    0.00     0.02     0.16     2    6.04

Samples were drawn using NUTS(diag_e) at Mon May 25 19:47:18 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Warning message:
In UseMethod("depth") :
  no applicable method for 'depth' applied to an object of class "NULL"

and the pairs(). I don’t know much about how to examine a pairs plot. But I know that this couldn’t look worse :(

I will change the initial values and run again.

Yeah looks like something is going sideways. I likely won’t get to this until tomorrow ( Nuevo México, United States time).

1 Like

No Problem. I appreciate how much you’ve helped me already. :)

Have you plotted your data? I know you said you can’t share it but does anything look out of sorts?

This the plot of outcome Vs. time variables. The trajectories are for just a few subjects.
trajec

And the summary stats don’t look out of sorts to me. I ran the random-intercepts model without a problem. The trace plots got out of shape when I added the stochastic process.

Thanks for looking into this! :)

1 Like

I am not as familiar with cholesky_decompose errors. I know this came up over here My model with individual level random effects and group effects is really slow and that might be of some help.

Maybe @mike-lawrence knows more about this or knows someone that does? Also tagging in @bbbales2

1 Like

Starting to get out of my expertise here, but sigma and tau still seem to have very large values in the posterior, and very large variance between chains on each. Is there a way to rescale the data so that these parameters are approximately on the ±10 scale?

1 Like

Actually started looking at the data and I think maybe I can provide some expertise in querying your model choice (rather than attempting to diagnose why your current choice isn’t working). The data seem to be structured such that you have multiple individuals, each with a unique set of predictors (age, menopause, ppfev, ppdlco), and multiple measures of the outcome FEV1 obtained at time intervals specified by timeSince_t0. You model the influence of the predictors as a simple linear effect on the mean outcome and you model the influence of time as a Brownian stochastic process. I’m not familiar with the latter; is it common in your field? Does it encode deep knowledge of the data generating process? If not, then I’d suggest maybe taking a look at the more common (to my admittedly limited exposure) approach of a Gaussian Process (indeed, I suspect that there is probably a mathematical connection between Brownian stochastic process and a Gaussian process). GPs have been decently explored by Stan folks and you may be able to leverage more expertise by using that approach than with something you coded by hand yourself.

You might take a look at this example on how to do a non-hierarchical GP, and then this example on how to allow variability in the GP from participant to participant.

Oh, and note that the high correlation between the age and menopause variables might induce inefficient sampling that you could avoid by using the QR trick.

2 Likes

I actually thought about re-scaling/centering data. But I can only center age, ppfev1 and ppdlco. I will scale that and see whether it changes anything.

Yes, you are correct. My model is a random intercept model with a Brownian motion stochastic process. Random intercept is used to describe between-subject variation and Brownian motion is used to capture the evolution of outcome variable through time within each subject.

In biostatistics we commonly use BM and related stochastic processes to capture the within-subject variations. According to a preliminary study I did using the r package “lmenssp”, integrated Ornstein Uhlenbeck (IOU); which is related to scaled BM captures, within-subject variation well. So my plan was to use a more generalized version of BM, which is fractional BM for this data.

Yes, you are correct. BM and fractional BM are both (rather can be) gaussian processes. I will try to use the links you attached for my model. Thank you! :)

2 Likes


Now my model is much much better! It takes almost 22 hrs to run the model. I know it’s not the best. But now I can work on improving what I have. :)

Thank you @Ara_Winter and @mike-lawrence for your help! I’m very new to stan (2-3 weeks) and I have little to no experience with it. So I really appreciate your help. :)

2 Likes

Hey that’s looking way better. Glad it’s up and running.

I am not the best at optimizing models for speed. Hopefully someone else can jump in. I can look around and see who tackles those kinds of questions.

1 Like

If GPUs are available (you’d want one per chain, so likely run on EC2 or GCE) there’s GPU-based parallelism that would speed up the GP part. And the new reduce_sum should pretty flexibly speed up the final likelihood part. If it’s critical to get things going faster for multiple repeated applications of this model that is. Might not be worth working it all out if this is a one-off sampling scenario. (Though if you did work it out I imagine lots of other folks would love a thorough how-to on setting up a one-chain-per-multi-core-plus-GPU-instance on one of the cloud providers for models with simultaneous use of GPU-acceleration and reduce-sum )

Double-checking: no divergences and good rhats now?

1 Like

All R hats were <= 1. :) But there were 3 divergent transitions and I’m using adapt_delta = 0.999. I got the following two warnings too.

2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I thought running the model for more iterations. Any advice or solutions you could think of besides that???

I think that’s a great idea. But, I don’t have GPUs available right now.

Thanks!

Does H need to be this?

or can it be H ~ normal(0.5, 0.5) ? I didn’t think that will solve your ESS error but it’s a little cleaner.

There might be a need to parameterize the model? Looks like in the pairs plot some of your alphas are correlated. I am not as familiar with these kinds of models so maybe it’s how they are.

Yes, H is in (0, 1) by definition. Which is an open interval. But in the constraints for H, I added <lower=0, upper=1>. I don’t know whether that’s causing a problem.

Re-parameterization is a good idea. I will try that.

Thanks!

1 Like