Segfaults with large model


I’m trying to run a very large double-hierarchical model where both location effects and dispersion effects are treated hierarchically. There are a little under 2M observations and almost as many nuisance parameters (most location and dispersion effects and their non-centred reparameterisations). I am encountering segfaults at various stages for which it is hard to generate a minimal reproducible example to diagnose the problem(s). In order to troubleshoot it would be good to know:

1/ If my stan code has a bug (e.g. indexing is running over the end of an array) will stan pick up on this or could this cause a segfault?

2/ It has segfaulted after sampling using the ‘sampling’ function in rstan. Is it possible there is a problem allocating memory when writing to R, or would a warning be issued in this instance?

3/ The memory overheads are huge because of all the nuisance parameters. From what I’ve read, using the pars function in sampling can be used to prevent them being written into R, but the full set of posterior samples are still retained during sampling. Am I right in thinking that it is not possible to discard the posterior samples for a subset of parameters? I have no interest in them - not even their means.

The segfaults have occurred on Mac 10.13.6 & Scientific Linux 7.5 (rstan_2.18.2 in both cases).



If I undestand your point

  • you can use “pars” option in you sampling statement (or stan statement) in rstan for example to select only the parameter you want to save,
  • also save_warmup = FALSE allows to save memory