Issue when trying to fit a linear mixed effects model using Rstan

Hi,
I have written a Rstan code for a linear mixed-effects model with a random intercept and a Brownian motion stochastic process. I used the stan() to do that. I also passed initial values into the function. But, If I run it for 4 chains, 10000 iterations, It runs for 2 (or less) chains completely and maybe half of the remaining to chains and does not go forward than that. The same thing happens when I change the number of chains/iterations. I appreciate any suggestions/ help. Thanks!

`````` functions{
matrix vcovFBM(vector t,//input timeSince_t0 from ni_l to ni_u
real H,
int n,
real kappa)//input ni as a number
{
matrix[n, n] m;
for(j in 1:n)
{
for(k in 1:n)
{
if(j == k)
{
m[j, k] = pow(fabs(t[j]), 2*H);
}
else{
m[j, k] = kappa*0.5*(pow(fabs(t[j]), 2*H) + pow(fabs(t[k]), 2*H) - pow(fabs(t[j] - t[k]), 2*H));
}
}
}
return (cholesky_decompose(m));
}
}

data {
// Define variables in data
// Number of level-1 observations (an integer)
int<lower=0> N;
// Number of level-2 clusters
int<lower=0> Ni;
// Cluster IDs
int<lower=1> level2[N];
int K;
// Continuous outcome
real<lower=0> Y_ij[N];
// Continuous predictor
matrix<lower=0>[N, K] X_ij;
int<lower=0> ni_l[Ni];
int<lower=0> ni_u[Ni];
vector<lower=0>[N] timeSince_t0;
int<lower=1> ni[Ni];
}

parameters {
// Define parameters to estimate
//Parameters of covariates
vector[K] alpha;
// Level-1
real<lower=0> sigma;

// Level-2 random effect
real ui[Ni];
real<lower=0> tau;

vector[N] wij;
real<lower=0> kappa;
//real<lower=0,upper=1> H;
}

transformed parameters  {
// Varying intercepts
real alpha_0j[Ni];

// Individual mean
vector[N] mu;
// Level-2 (100 level-2 random intercepts)
for (j in 1:Ni) {
alpha_0j[j] =  ui[j];
}
// Individual mean
for (i in 1:N) {
mu[i] = alpha_0j[level2[i]] + X_ij[i,]*alpha + wij[i];
}
}

model {
// Prior part of Bayesian inference
sigma ~ inv_gamma(1, 10^4);
tau ~ inv_gamma(1, 10^4);
alpha ~ normal(0, 10^4);
H ~ uniform(0, 1);
kappa ~ inv_gamma(0, 10);
// Random effects distribution
ui ~ normal(0, tau);
//FBM
for(i in 1 : Ni)
{
wij[ni_l[i]:ni_u[i]] ~ multi_normal_cholesky(rep_vector(0, ni[i]), vcovBM(timeSince_t0[ni_l[i]:ni_u[i]], ni[i], kappa));
}
// Likelihood part of Bayesian inference
// Outcome model N(mu, sigma^2) (use SD rather than Var)
for (i in 1:N) {
//target += normal_lpdf(Y_ij[i] | mu[i], sigma)
Y_ij[i] ~ normal_lpdf(mu[i], sigma);
}
}

``````

These priors seem like they might be trouble; do they truly encode your domain expertise? For example, the inverse gamma priors have vanishingly small mass for values less than about 2000 and have rather heavy tails such that 10% of the mass is above 10,000. And an SD of 10^4 for the normal prior conveys that your uncertainty spans 4 orders of magnitude, which seems extreme.

Maybe these are proper encodings of your domain expertise, in which case Iâ€™d suggest that you might get some benefit from rescaling the data so that the scales of the parameters are in the Â± 10 range, as the adaptation parts of Stan do better when parameters are in this range.

1 Like

Thanks for your response. I tried what you advised. Now the chains run completely quickly. But, I get the following error/warning messages.

``````Warning messages:
1: There were 229 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is Inf, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess``````

What did you set your priors to? As @mike-lawrence pointed out you will want some slight to informed priors. I would also set your iterations to 2000 to start with.

@Ara_Winter
Now the priors look like this:

``````// Prior part of Bayesian inference
sigma ~ inv_gamma(1, 10);
tau ~ inv_gamma(1, 10);
alpha ~ normal(0, 10);
H ~ uniform(0, 1);
kappa ~ inv_gamma(1, 1);
``````

I also decreased the iterations to 2000. I still get those warning messages with â€śDivergent transitions after warmupâ€ť.

1 Like

I tried to run this code for one chain, with just 2000 iterations, and without passing any initial values for the parameters. I got the following error. Now Iâ€™m not sure whether my model is right. Iâ€™m pretty new to Stan. So I donâ€™t know what this means. Any help would be much appreciated.

``````[1] "Error in sampler\$call_sampler(args_list[[i]]) : " "  c++ exception (unknown reason)"
error occurred during calling the sampler; sampling not done``````

Can you tell me the machine and OS (with version) you are running on?

1 Like

@Ara_Winter
Iâ€™m using a MacOS Catalina, Version 10.15.1 (19B88).
Processor: 1.6 GHz Dual-Core Intel Core i5
Memory: 8 GB 1600 MHz DDR3

Thanks!

Ah. That error comes up with Catalina. It might also be caused by something else too.

What is your version of R and rstan?

1 Like

@Ara_Winter Thanks, Iâ€™ll check those links/fixes out on my Mac.

The version of R Iâ€™m using is, R version 3.6.1 and the rstan version is 2.19.1.

BTW, Iâ€™m started running this same code in a windows machine after seeing @mike-lawrenceâ€™s reply about priors.

``````Click the Refresh button to see progress of the chains
starting worker pid=1340 on localhost:11532 at 19:10:16.821
starting worker pid=10276 on localhost:11532 at 19:10:17.702
starting worker pid=6988 on localhost:11532 at 19:10:18.571
starting worker pid=2124 on localhost:11532 at 19:10:19.512

SAMPLING FOR MODEL 'init' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.063 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 630 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'init' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:    1 / 5000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'init' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0.081 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 810 seconds.
Chain 3:
Chain 3:
Chain 3: Iteration:    1 / 5000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'init' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0.11 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1100 seconds.
Chain 4:
Chain 4:
Chain 4: Iteration:    1 / 5000 [  0%]  (Warmup)
Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 4: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 2: Iteration:  500 / 5000 [ 10%]  (Warmup)
Chain 3: Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 4: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 4: Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 3: Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 4: Iteration: 2000 / 5000 [ 40%]  (Warmup)
Chain 3: Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 3: Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 4: Iteration: 2500 / 5000 [ 50%]  (Warmup)
Chain 4: Iteration: 2501 / 5000 [ 50%]  (Sampling)
Chain 4: Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 4: Iteration: 3500 / 5000 [ 70%]  (Sampling)
Chain 4: Iteration: 4000 / 5000 [ 80%]  (Sampling)
Chain 4: Iteration: 4500 / 5000 [ 90%]  (Sampling)
Chain 4: Iteration: 5000 / 5000 [100%]  (Sampling)
Chain 4:
Chain 4:  Elapsed Time: 1551.54 seconds (Warm-up)
Chain 4:                2991.6 seconds (Sampling)
Chain 4:                4543.14 seconds (Total)
Chain 4:
Chain 2: Iteration: 1000 / 5000 [ 20%]  (Warmup)
Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
Chain 3: Iteration: 3000 / 5000 [ 60%]  (Sampling)
Chain 3: Iteration: 3500 / 5000 [ 70%]  (Sampling)
``````

And the last four steps started yesterday and still running. Maybe thereâ€™s something wrong with my modelâ€¦???

Hmmm there might be! Let me take a look at the model. What does you data look like? Or at least a simulation of your data if you canâ€™t share the actually data.

Thanks to you and @bgoodri I managed to install Rcpp correctly. I donâ€™t get the compilation error anymore.

Since Iâ€™m not at liberty to post my original dataset. So, a simulated dataset like mine simulated_data.csv (31.2 KB) is attached here. I also attached the rstan code for_sim_data.stan (2.0 KB) and r code for_sim_data.R (1.3 KB) I use to run the model. I still get divergent transitions when I run this!

Thanks for all your help! :)

2 Likes

Ok cool. Glad Rcpp is up and going. Have you run the pair plots? And do you have tidybayes install so we can look at the diagnostics plots.

Thanks for the sim data, code, and R script. Iâ€™ll pull those down and take a look.

So the brute force method is just to crank up the number of iterations. However this is not very elegant and may not solve the problem ;)

2 Likes

:) I have not used tidybayes before. But I installed the package now and will check it out! Thanks for the suggestion. I started running the model just now. I will upload the pairs() plot after it gets completed! Not sure it will though. Because I tried running 4 chains, 2000 iterations with initial values. 2 chains are already completed. one is 90% completed. But the last one is still 10% complete. That chain seems to lag/stuck! If I run this model without passing initial values it throws errors (like the one below) for the initial values rstan selects.

``````Rejecting initial value:
Chain 2:   Error evaluating the log probability at the initial value.
Chain 2: Exception: Exception: cholesky_decompose: Matrix m is not positive definite``````

The trace plots look like this!

This is the inference.

``````Inference for Stan model: init.
4 chains, each with iter=2000; warmup=1000; thin=10;
post-warmup draws per chain=100, total post-warmup draws=400.

mean  se_mean       sd    2.5%     25%     50%      75%    97.5% n_eff    Rhat
alpha[1]     0.47     0.02     0.03    0.42    0.46    0.47     0.47     0.50     2    7.05
alpha[2]    -0.12     0.03     0.04   -0.19   -0.17   -0.11    -0.09    -0.08     2    6.24
alpha[3]    -0.07     0.05     0.07   -0.19   -0.07   -0.04    -0.02    -0.02     2    9.60
alpha[4]     0.11     0.04     0.07    0.05    0.06    0.08     0.11     0.24     2    3.91
alpha[5]    -0.21     0.25     0.38   -0.97   -0.11   -0.02     0.03     0.03     2    3.12
alpha[6]    -0.02     0.01     0.03   -0.07   -0.04   -0.02     0.00     0.04     3    1.88
sigma    25165.72 26602.77 37716.41  925.58 1951.94 4724.39 27855.82 90313.34     2 2030.32
tau       1695.27   175.49   249.58 1245.50 1585.56 1745.62  1875.29  1947.82     2   15.06
kappa        0.38     0.25     0.36    0.00    0.18    0.29     0.41     1.00     2    9.08
H            0.03     0.03     0.05    0.00    0.00    0.00     0.02     0.16     2    6.04

Samples were drawn using NUTS(diag_e) at Mon May 25 19:47:18 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Warning message:
In UseMethod("depth") :
no applicable method for 'depth' applied to an object of class "NULL"
``````

and the pairs(). I donâ€™t know much about how to examine a pairs plot. But I know that this couldnâ€™t look worse :(

I will change the initial values and run again.

Yeah looks like something is going sideways. I likely wonâ€™t get to this until tomorrow ( Nuevo MĂ©xico, United States time).

1 Like

No Problem. I appreciate how much youâ€™ve helped me already. :)

Have you plotted your data? I know you said you canâ€™t share it but does anything look out of sorts?

This the plot of outcome Vs. time variables. The trajectories are for just a few subjects.

And the summary stats donâ€™t look out of sorts to me. I ran the random-intercepts model without a problem. The trace plots got out of shape when I added the stochastic process.

Thanks for looking into this! :)

1 Like

I am not as familiar with cholesky_decompose errors. I know this came up over here My model with individual level random effects and group effects is really slow and that might be of some help.