Sampling difficulties with hierarchical/CAR growth model with truncated likelihood

Hi Everyone,
I have been adapting some hierarchical nonlinear three-parameter growth models with a left-truncated normal likelihood from JAGS into Stan for a manuscript revision and wanted to include some spatial dependence by having the mean of the hyper-parameters as a linear function of latitude and the variance taken from a conditional autoregressive model (CAR). One model includes the CAR parameterization and one model does not. The CAR model was adapted from the Stan case study (http://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html). However, I have been having some difficult with mixing, divergence, and sampling times (slow) for both models. To limit these issues I have vectorized the code and created non-centered versions, however, I am still having issues with mixing, divergence, and slow sample times. The truncated normal slows things down considerably and one caveat with the CAR model is that one region does not have any data, but I want variation from neighboring regions to inform one another (i.e. no data from Oregon, but California and Washington can still share information).

I was wondering if the issues are with how the model is configured or something I am missing. Included are the data and R code with stan code for the two models.

Any help would be greatly appreciated.
Cheers,
Grant

STAN_Hierarchical_VBGF_Runs.R (6.6 KB)
VBGF_Data.csv (1.1 MB)

Sorry for the late response (been saying that a lot as I try to work through the backlog).

Spatial models can be very tricky to fit, especially with hierarchical structure.

When you say you have issues with mixing, if you run for a lot of iterations, do you get reasonable R-hat values? If not, what’s happening if you look at the traceplots? Can you tighten any of the priors?

Have you tried running it with data simulated from the model? That might help diagnose whether the problem is due to model misspecification—sometimes if the model doesn’t fit, the sampling and mixing will be bad.

Sorry for the generalities, but I don’t know much about the CAR models myself. The ICAR models (there’s another case study) are much simpler but less expressive,

Whenver you see exp nested in log, it’s a cause for concern about stability:

log(linf .* (1- exp( -k .* (age - t0))));

Luckily it’s a common form, so we have a built-in replacement that’s more stable:

log(linf) + log1m_exp(-k .* (age - t0));

This probably isn’t your problem, though. Multiplying parameters as happens here with k * t0 can be problematic for identifiability, usually manifested as a banana-like posterior (on the unconstrained scale).

The LCCDF itself is very unstable as we’re just delegating to the CDF on the standard (not log) scale.

Also, replace inverse(L_linf) * alpha_linf with L_linf \ alpha_linf. It compiles fine for me in RStan 2.16.

Hi Bob,
Thank you for the advice! I am getting that sense spatial models are difficult, but I was also surprised that the first model that does include the CAR component is proving to be tricky. It took about 2 weeks each to run both models. For the both models, R-hat was generally > 3 and the effective sample size was mostly around 2. For the car model, two of the chains got stuck on their initial value and never jumped from there and one chain sample (but the autocorrelation was high). For example, here is one traceplot:

Rplot

I will try tightening the priors up and change the code to the loglm_exp replacement and try running those to see what works (hopefully it doesnt take two weeks again). I have not ran it with data simulated from the model, I will give that a go.

The left truncation seems to be the main issue for me in terms of sampling time. Would you recommend using the normal( log_mu, sigma)T(log(selectivity),) inside a for loop instead of the lccdf? It slows things down decently however.

I’m suspicious of these priors

  sigma ~ inv_gamma(2,2);
  sigma_vbgf ~ inv_gamma(2,2);

I would guess that for those chains which get stuck, you get sigma or sigma_vbgf values which are not sensible compared to what you know about the phenomenon you are modelling.

Could you make experiments with smaller data set? With less states? To get more quickly some idea how to improve the model and priors.

This probably means the model doesn’t fit the data very well. Andrew calls this the “folk theorem”.

Hi Bob,
Thank you for the help! The left-truncated is slowing things down a lot, so I am going to start back from the beginning without it and re-evaluate if I need it. I will shoot an update if there are any issues or if it works!

Cheers,