Strongly Informative Priors and Standardizing Transformations?

I have an inference problem with strongly informative priors.

I know with vanilla regression models and weakly informative priors, the best practice is to transform each variable so that their mean and sd match the unit normal. This improves inference performance. Then you just pick the usual priors that are weakly informative on the unit normal scale.

What about in my case? With strongly informative priors, should I still do the unit normal transform? If so, how do I appropriately translate my priors from the original scale to the transformed scale?

To be more concrete: Let’s say I have independent data x, dependent data y, latent parameters z, and priors z_mu, z_sigma. So z ~ normal(z_mu, z_sigma), y = f(x,z). z_mu and z_sigma are derived from the literature in my field. If I normalized x and y by their empirical mean and standard deviation, is it possible to rescale z_mu and z_sigma accordingly? How? Using mean(x), mean(y), sd(x), sd(y)? Can I do all this without touching the model, or would this adjustment require handling Jacobians?

Thanks very much!

So you have (and I’ll make up a set of strongly informed priors):

data{
  int n ;
  vector[n] x ;
  vector[n] y ;
}
parameters{
  vector[n] z ;
  real z_mu ;
  real<lower=0> z_sigma ;
}
model{
  z_mu ~ normal(10,.1) ; // strongly-informed prior here
  z_sigma ~ weibull(2,3) ; // strongly-informed prior here
  z ~ normal(z_mu,z_sigma) ; // structural "prior"
  y ~ f(x,z) ; // where f is presumably defined in a user-defined functions block
}

So you could focus just on getting the hyperparameters for z such that the sampler is seeing quantities in the unit normal realm. For a normal prior on the location hyperparameter z_mu the analytically obvious transform is a shift and scale, but for priors on the scale hyperparameter z_sigma you might have to play with what scaling gets the PDF into the right range (here I would, in R, run hist(rweibull(1e5,2,3)*q,br=100), varying q until I see the resulting histogram places values mostly in the 0-1 range).

Now, since z_mu is unbounded and the transform we want to achieve a generally-unit-scale domain involves just a shift and scale, you can use the wonderful offset=...,multiplier=... syntax. The same is coming for bounded variables soon, but in the meantime we have to do things by hand:

...
parameters{
  vector[n] z ;
  real<offset=10,multiplier=.1> z_mu ; 
  real<lower=0> z_sigma_ ; //note suffix to denote this is to be transformed (just a personal syntax of mine)
}
transformed parameters{
  real<lower=0> z_sigma  = z_sigma_* (3/.8) ;
}
model{
  z_mu ~ normal(10,.1) ; // same prior as before 
  z_sigma_ ~ weibull(2,.8) ; // with TP, implies z_sigma ~ weibull(2,3) 
  z ~ normal(z_mu,z_sigma) ; // structural "prior"
  y ~ f(x,z) ; // where f is presumably defined in a user-defined functions block
}

Now, z is also a parameter, so we probably want the sampler to see a quantity on the unit scale associated with it. I don’t think the offset/multiplier syntax accepts transformed parameters as values, otherwise we could do simply:

...
parameters{
  vector[n]<offset=z_mu,multiplier=z_sigma> z ;
  ...
}

and we’d be done. But that would involve a bunch of look-ahead by the parser to discern that z_sigma is a TP that gets its value from z_sigma_, etc so as I say I don’t think it’s currently supported. Instead we can do things manually:

...
parameters{
  vector[n] z_ ;
  real<offset=10,multiplier=.1> z_mu ; 
  real<lower=0> z_sigma_ ; 
}
transformed parameters{
  real<lower=0> z_sigma  = z_sigma_* (3/.8) ;
  vector[n] z = z*z_sigma + z_mu ;
}
model{
  z_mu ~ normal(10,.1) ; // same prior as before 
  z_sigma_ ~ weibull(2,.8) ; // with TP, implies z_sigma ~ weibull(2,3) 
  z_ ~ std_normal() ; // structural "prior"
  y ~ f(x,z) ; // where f is presumably defined in a user-defined functions block
}

And that’s really it for this model. Sometimes you’ll see folks scaling y and/or x to unit-normal, but that’s usually motivated by either a desire to get the parameters into unit normal (which we’ve done). Sometimes you’ll also see folks shifting y and/or x and that’s usually about putting the intercept of a regression model into the unit normal range (plus I think it can reduce correlation in the posterior in some multiple regression contexts?).

This is very helpful, thank you Mike!

1 Like