I am working with a model that requires a very long compute time (weeks without within-chain parallelization), and I am slowly adding complexity in the covariate structure. I am now ready to add a very large random effect (i.e. many levels–like 200,000) to deal with some nuisance variation, but I don’t expect my estimates for the existing parameters to change much.
My question is whether there is a way to pass the mass matrix from the fitted model without the random effect as an initial guess for the part of the mass matrix in the new model that corresponds to the previous model. My concerns are that:
I don’t trust myself to implement this properly, and
I wonder whether the new model will lose “memory” of the initial guess that I pass for the known part of the mass matrix as it adapts the terms corresponding to the new random effect (which represent the majority of parameters in the model!).
Any advice much appreciated.
Edited to add: If it’s not possible to pass a partial mass matrix, I realize that I can help things a bit by using offset/multiplier to rescale the fixed effects and random effect means so that their posteriors correspond approximately to standard normals. This would achieve the same effect, I think, as passing in the diagonal elements of the mass matrix for those parameters. If there’s a way to achieve this for standard deviations with half-normal priors and/or for the levels of random effect variables, I’d love to learn about it.
Yes, this is the same effect… and no, you cannot give a partial mass matrix. Going via offset/multiplier is a good approach here.
To deal with half-normal or any other constrained stuff you should declare the parameter yourself on the unconstrained space (so positive things go to the log scale). Then you will need to add the Jacobian adjustments manually so that you can still write your priors on the constrained space. With that implemented you can again work directly with the offset/multiplier approach.
Just want to make sure I’m getting this right (I have a nagging worry that I’m somehow doing this backwards). If I have a parameter sigma declared as:
parameters{
real<lower=0> sigma;
…
model{
sigma ~ normal(0,2); // this is the prior
…
with posterior distribution D, then what I need is:
parameters{
real<offset=mean(log(D)), multiplier=sd(log(D))> log_sigma;
…
model{
sigma = exp(log_sigma);
sigma ~ normal(0,2); // this is the prior
target += log_sigma; // this is the Jacobian adjustment
…
Thanks. Does the offset/multiplier syntax accept vectors of offsets and multipliers (which would enable me to apply this technique to standardize the posteriors of random effect terms with many levels)?
Vectorized offset/multiplier seems to work quite well, and in my use case this strategy effectively brings down the tree depth in the early part of adaptation. Presumably this will ultimately translate into a nontrivial wall time saving.