Stan coding approach to two missing parameter situations

What are good Stan coding approaches to the following problems?

  • I would like to write Stan code for a regression model that may or may not have random effects to handle subject clustering (Gaussian random effects). If the current model does not have random effects I can set the number of clusters to 1 and the prior on the variance of random effects to have virtually all its mass at zero. Is there a much better way to handle such switching?
  • In general I would like to fit models with and without a certain set of parameters (representing special fixed effects). Can I pass from rstan a zero length data element (subset of covariance matrix) that is associated with those parameters? And how do I tell Stan that a vector of parameters is to be considered known (a vector of zeros) and not sampled, which will effectively exclude these parameters from consideration?
1 Like

I’ve handled a similar problem by writing multiple .stan model files, and then putting common elements into another file which I #include. The choice of which model file to compile and pass data to is done at the higher level. I’m not sure this is the best way to do it, but I think it may reduce the number of bugs in the models since you just write one clear model that does one and only one thing – without any conditionals.

Stan has a quite limited set of constructs for conditionals and abstraction, so I think it’s best to keep as much of that stuff in R or Python as is possible.

That being said, I am a new user of stan, but this is my experience so far.

I think you are spot on for our own analyses. At present I’m packaging modeling functions in a R package for which I want the user to not have to run the compile step. That’s why it would work a bit better for me for the 8 possible permutations of analysis options to be handled within Stan. BTW the R brms package follows the model you mentioned.

I did just find a very useful discussion that I’m digesting now: https://github.com/stan-dev/stan/issues/2377

Hi Frank,

Does this post cover what you’re after?

This is also similar to the approach used by rstanarm

1 Like

I wish to add an alternative perspective.

The blog post claims that “now we don’t have to maintain two separate models”. I don’t think this is entirely accurate – you do have to maintain multiple models. If \sigma is known, that is a different model compared to when it is not known. You’re just using conditionals in stan to create this separation between the two models, and stan (while a remarkable tool for posterior sampling) is a programming language with only the most basic facilities for code-reuse and organization. They also say that “But what if the full model has several hundred lines…”. If this is the case, a lot of the code can (should?) probably be encapsulated into user-defined functions anyway, so it shouldn’t be too hard to share across multiple files.

Even in the simple example provided, it’s already checking the condition sigma_known? in three separate places! If we were to write a model that actually did have several hundred lines and conditional checks all over the place it would be a maintenance nightmare. If instead we wrote two separate .stan files, and encapsulated the common aspects in the model block into an #included file, the test sigma_known would only need to occur in 1 place and in a language with all the useful facilities for code reuse etc.

I’m not claiming that my suggestion is the best approach, and the Blog’s author (@martinmodrak) clearly knows way more about stan than I do, but I’m not entirely convinced.

1 Like

Totally agree with @RJTK that the conditionals are not the only (and sometimes not the best) way to reuse code in Stan. It is just another tool, thanks for the context.

I think conditionals can make your code somewhat better if either:

  • Encapsulating into functions is annoying (e.g. when you need to pass fifteen arguments to the function)
  • You have multiple separate dimensions that modify your model (e.g. multiple qunatities that can all separately be known/treated as parameter) here the #include approach would force you to have a separate .stan file for each combination.

Obviously conditionals can become unweildy and the ultimate option is to start generating Stan code from R/Python/…, the way brms does, which is the most general approach.

2 Likes

I ended up with a highly satisfying all-in-one approach that handles all the permutations of model specification, and is much easier to maintain. I used zero-length arrays for irrelevant parameters. The resulting Stan code is here: https://github.com/harrelfe/stan/blob/master/lrmconppo.stan

This handles

  • random effects vs. no random effects
  • partial proportional odds model vs proportional odds model
  • two different priors for random effects standard deviation
  • constrained partial proportional odds model vs. unconstrained
1 Like