Jeffreys prior for a vector variable

I am trying to model the Jeffreys prior for error variance in a heteroscedastic ANOVA design in rstan.

That is to say, \pi(\mu,\sigma_1^2,\dotsb,\sigma_C^2)\varpropto\Pi_{i=1}^{C}\sigma_i^{-2}.

Is the following setting correct?

target += -2 * sum(log(sigma));


where the vector variable is declared as

vector<lower=0>[C] sigma;  // standard deviation of the error term


It seems not, because my current codes return an MCMC divergence warning.

Thank you for the kind comments.

Your variable sigma is a scale (\sigma), not a variance (\sigma^2), but otherwise corresponds to the math you wrote down.

We usually recommend at least weakly informative priors, so I’m not so familiar with these reference priors. According to the Wikipedia on Jeffrey’s priors, the prior on scale parameters \sigma should be \displaystyle {} \propto \sigma^{-1}.

Getting the math right doesn’t mean you won’t get divergences from difficult-to-sample posterior geometry using a direct parameterization. Especially if there’s a weak prior and not much data.

We can provide more detailed help if you include the entire model and data you’re trying to sample. For example, why isn’t \mu getting a prior and is there enough data to fix it? I realize from the Wikipedia page that’s the reference prior for the location of a normal distribution with a fixed scale. I don’t know what the joint reference prior is for location plus scales.

Thank you, Bob, for the kind comment.
According the Wikipedia you mentioned, “Similarly, the Jeffreys prior for \ln\sigma^2=2\ln\sigma is also uniform”. So, I don’t think the mathematical expression \pi(\mu,\sigma_1^2,\dotsb,\sigma_C^2)\propto\prod_{i=1}^C\sigma_i^{-2} is incorrect. The complete code and data can be found on my previous post in the Stan forum here.

The difference is whether the uniformity is over \sigma or \sigma^2. The density p(\sigma) \propto 1 / \sigma is uniform over \sigma whereas p(\sigma^2) \propto 1 / \sigma^2 is uniform over \sigma^2. The former is uniform over scales whereas the latter is uniform over variances.

I don’t know anything about Jeffrey’s priors, so this is just about densities.

Also, you have to be careful in that Stan applies its own Jacobian correction for variables declared with <lower = 0>. For complete control, you can define unconstrained parameters and constrain and adjust them yourself.

@Bob_Carpenter is correct that the standard Jeffreys prior for a one-dimensional standard deviation is just \sigma^{-1} which corresponds to the Stan code

parameters {
vector<lower=0>[C] sigma;
...
}

model {
...
target += -sum(log(sigma));
...
}


That said multiple caveats need to be considered.

Firstly Jeffreys priors are typically terrible prior models unless the likelihood function is very informative. The posterior distributions that arise from Jeffreys priors can absolutely be pathological and frustrate Stan’s sampler resulting in divergences. If you’re seeing divergences then you may need to consider more informative prior models to suppress any pathological behavior.

Secondly be careful that Jeffreys priors aren’t all that consistent in higher-dimensional spaces. For example if sigma is a vector of scales corresponding to the diagonal elements of a covariance matrix then the joint Jeffreys prior will not, in general, be equal to the product of the component Jeffreys priors.

2 Likes

Hello, Dr. Betancourt.
Thanks for your enlightening comments. I want to confirm the Stan syntax of the Jeffreys prior for a scalar standard deviation or variance.

parameters {
real<lower=0> sigma;  // standard deviation of the error
}


I see you support the following code, which is consistent with this book and bridgesampling.

model {
// priors
target += - log(sigma);          // Jeffreys prior
}


However, I also found other code, e.g., OSF and even Stan User’s Guide. For now, I think the below syntax is incorrect.

model {
// priors
target += -2 * log(sigma);          // Jeffreys prior?
}


Jeffrey’s prior for a standard deviation parameter is target += -log(sigma).
The other code you found on OSF explains in the associated paper

But note that this is the prior for \sigma_\epsilon^2, not \sigma_\epsilon and these differ by a Jacobian adjustment. Unfortunately the code uses sigma instead of sigma_squared parameter so it is indeed incorrect.
The code from StackOverflowStan User Guide (EDIT: you edited your post while i was typing) is correct because it declares sigma_sq as a parameter and computes sigma = sqrt(sigma_sq).

2 Likes