Multivariate Normal: Scale invariant priors for regularization?

I’m using Stan to create a general type of model which I’d like to fit very freqently on hundreds of similar-but-not the same data-sets.

I’m interested in selecting hyperparameter values for out-of-sample prediction performance, effectively more for regularization than for modelling known prior information. Ideally, I’d like that those hyperparameters to be on some intuitive scale like 0 to 1 (e.g. flat prior to zero variance).

The usual solution would be to rescale all data before model fitting, so that constant scale-values on priors make sense across data sets. However my model involves a multivariate normal which provides a prior on other latent state variables, and as such, the ‘data’ (parameters) are internal to the Stan model.

One solution would be to fit the model twice:

  1. Estimate the multivariate normal with weakly informative priors
  2. Take the estimates of the mvnorm’s scale parameters, discount them by some amount, and feed them in as known-data for the scale parameters of the mvnorm during a second model fit

However, this is wasteful and janky.

Is it possible to use transformations within the model to supply a multivariate normal with scale-invariant variance parameters?

Welcome to the Stan forum, @stan-n00b.

It’s called “empirical Bayes” :-). Though it’s usually done a little differently.

If the models come from the same population of models, you can just fit a hierarchical model using a subset of models to estimate population parameters and then plug in point estimates of those for future models. How well that works depends on how homogeneous the population is and how well you can estimate the population-level parameters.

I’m not sure what you mean by scale-invariant variance parameters. You mean a scale-invariant prior on variance?

1 Like

I’m not sure what you mean by scale-invariant variance parameters. You mean a scale-invariant prior on variance?

Probably! My hope is that I could have a parameter that varied between 0 and 1 to adjust ‘regularization’, which I could use and interpret regardless of the data. This would be useful in the case of refitting the model to many heterogeneous datasets.

I’m wondering if there’s some trick via the transformed parameters block which might allow me to do this without much of a performance hit, or perhaps an alternative distribution to the multivariate normal for this purpose.