Stan coding approach to two missing parameter situations

What are good Stan coding approaches to the following problems?

  • I would like to write Stan code for a regression model that may or may not have random effects to handle subject clustering (Gaussian random effects). If the current model does not have random effects I can set the number of clusters to 1 and the prior on the variance of random effects to have virtually all its mass at zero. Is there a much better way to handle such switching?
  • In general I would like to fit models with and without a certain set of parameters (representing special fixed effects). Can I pass from rstan a zero length data element (subset of covariance matrix) that is associated with those parameters? And how do I tell Stan that a vector of parameters is to be considered known (a vector of zeros) and not sampled, which will effectively exclude these parameters from consideration?

I’ve handled a similar problem by writing multiple .stan model files, and then putting common elements into another file which I #include. The choice of which model file to compile and pass data to is done at the higher level. I’m not sure this is the best way to do it, but I think it may reduce the number of bugs in the models since you just write one clear model that does one and only one thing – without any conditionals.

Stan has a quite limited set of constructs for conditionals and abstraction, so I think it’s best to keep as much of that stuff in R or Python as is possible.

That being said, I am a new user of stan, but this is my experience so far.

I think you are spot on for our own analyses. At present I’m packaging modeling functions in a R package for which I want the user to not have to run the compile step. That’s why it would work a bit better for me for the 8 possible permutations of analysis options to be handled within Stan. BTW the R brms package follows the model you mentioned.

I did just find a very useful discussion that I’m digesting now:

Hi Frank,

Does this post cover what you’re after?

This is also similar to the approach used by rstanarm

1 Like

I wish to add an alternative perspective.

The blog post claims that “now we don’t have to maintain two separate models”. I don’t think this is entirely accurate – you do have to maintain multiple models. If \sigma is known, that is a different model compared to when it is not known. You’re just using conditionals in stan to create this separation between the two models, and stan (while a remarkable tool for posterior sampling) is a programming language with only the most basic facilities for code-reuse and organization. They also say that “But what if the full model has several hundred lines…”. If this is the case, a lot of the code can (should?) probably be encapsulated into user-defined functions anyway, so it shouldn’t be too hard to share across multiple files.

Even in the simple example provided, it’s already checking the condition sigma_known? in three separate places! If we were to write a model that actually did have several hundred lines and conditional checks all over the place it would be a maintenance nightmare. If instead we wrote two separate .stan files, and encapsulated the common aspects in the model block into an #included file, the test sigma_known would only need to occur in 1 place and in a language with all the useful facilities for code reuse etc.

I’m not claiming that my suggestion is the best approach, and the Blog’s author (@martinmodrak) clearly knows way more about stan than I do, but I’m not entirely convinced.