R packages for Stan 2.17.1 released


Actually, they were released before Christmas. We will have to push some fixes to CRAN in the next couple of days to fix things on Solaris, but otherwise they should be fine for StanCon.

The main change with StanHeaders (the R package that includes the Stan Math Library) is that a commit was made that fixed the massive slowdown due to using strings rather than pointers to character arrays for function names.

The main change to rstan is that it should work with a Mac that uses the upstream version of clang (which CRAN now uses) rather than Xcode’s clang. It is necessary that the upstream clang be installed in /usr/local/clang4/bin/clang++. After that, there should be no more “unknown exception” messages.

The changes to rstanarm were far more extensive. In terms of user-visible changes to existing functions:

  • The default prior on the auxiliary parameter in GLMs (e.g. sigma if the likelihood is Gaussian) is exponential rather than Cauchy. You can still specify prior_aux = cauchy() explicitly, although as always you probably will not get exactly the same numerical results as in previous rstanarm versions.
  • Aki’s new hierarchical shrinkage priors made it in ( https://twitter.com/avehtari/status/948670477240864768 ) and have new arguments (slab_df = 4 and slab_scale = 2.5).
  • Models estimated with stan_gamm4 may be a bit different now that stan_gamm4 is using mgcv::jagam to parse the formula rather than mgcv::gamm or gamm4::gamm4.

You can also use (some of) the families in the mgcv package with rstanarm model fitting functions. For example, the mgcv::betar family allows you to fit models with a beta likelihood using group-specific and / or non-linear functions with stan_glmer or stan_gamm4 as long as you are not parameterizing the auxiliary parameter (whereas the existing stan_betareg function allows you to parameterize the auxiliary parameter with covariates but does not permit group-specific or non-linear functions).

A bunch of new functions were added to rstanarm, such as

  • bayes_R2 which calculates the posterior distribution of the ratio of the variance (over the observations) of conditional mean to the variance of the predictive distribution for GLMs (including those with group-specific terms that you can condition on or integrate over). Andy already blogged about it.
  • stan_nlmer that uses the same likelihood as nlmer in the lme4 package, which fits models with a Gaussian likelihood but with a non-linear transformation of the linear predictor that can depend on group-specific terms. Even if you hate Bayesianism, the posterior means of these models (conditional on the group-specific terms) may well be more reliable than MLEs (that integrate over the group-specific terms), but watch out for multimodality in the posterior distribution
  • stan_clogit that uses the same likelihood as clogit in the survival package, which fits “case-control” models where by the research design a fixed number of observations within a group will be successes, such as a competition with exactly one winner per contest. The stan_clogit formula is a bit different than the clogit one in that the former can have lme4-style group-specific terms.

Finally, Sam Brilleman contributed a ton of code related to his Ph.D. dissertation. The stan_jm function is a lot like the JM function in the JM package or the JMbayes function in the JMbayes package in that it estimates a “joint” model for the survival time and the severity of the symptoms for people with terminal diseases. These two things are obviously not conditionally independent, and you can specify a variety of dependence forms for the association structure. Also, Sam contributed stan_mvmer, which is sort of a generalization of what stan_jm does but for lme4-style models that have multiple outcomes with correlated error terms.

We are using a different build process for rstanarm now, so new R packages that come with compiled Stan models should follow this pattern, which is enforced by the rstan_package.skeleton function in the rstantools R package. It is probably a good idea for existing R packages that come with compiled Stan models to migrate over to the new pattern, although not necessarily ASAP. Under the old way of doing things, Stan programs were in the exec/ folder and chunks of Stan code were in the inst/chunks/ folder. Now, Stan programs are in the root of the src/stan_files/ folder and chunks of Stan code can be in subfolders of src/stan_files/ and included via the “native” method in stanc (the #include statement must be flush left and there can be no whitespace or comments after the file name).

The new src/Makevars or src/Makevars.win files cause the Stan programs to be compiled separately and then combined into a shared object (whereas before we were gluing all the C++ files into a massive C++ file and compiling that). The upshot of this is that (packages like) rstanarm can now be compiled with much less RAM than before. Conversely, it takes much longer to build rstanarm from source unless you have previously specified the environmental variable MAKEFLAGS = -j4 or something (which negates the RAM savings). If you are not using Windows, the best of both worlds can be achieved by using link-time optimization in your local ~/.R/Makevars file and we may try to accomplish that automatically in the next version of rstanarm.


Thanks particularly for dealing with the Mac issues—these upstream errors are a pain (Ben’s working without actually having a Mac himself). Everyone else probably doesn’t realize just how much work went into this particular release for compatibility.