# How do I write the Jeffreys prior for error variance (heteroscedasticity)?

I need to model the Jeffreys prior for error variance in a heteroscedastic ANOVA design in rstan . The linear mixed-effects model is Y_{ij}=\mu+\sigma_i t_i+\sigma b_j+\epsilon_{ij},\ \ \ \ \epsilon_{ij}\overset{\text{ind.}}{\sim}\mathcal{N}(0,\ \sigma_i^2)\ \ \ \ \text{for } i=1,\dotsb,a;\ j=1,\dotsb,n,

where \mu is the grand mean, t_i is the standardized treatment effects, b_j is the standardized subject-specific random effects, a is the number of levels, n is the number of subjects. The heterogeneity of variance means that the error variance \sigma_i^2 is not constant across the different levels of the treatment.

\sigma^2=\frac{1}{a}\sum_{i=1}^{a}\sigma_i^2

The priors are (1) \pi(\mu,\ \sigma_1^2,\dotsb,\sigma_a^2)\varpropto\prod_{i=1}^{a}{\sigma_i^{-2}},

(2) b_{j}\mid g_b\ \overset{\text{ind.}}{\sim}\mathcal{N}(0,\ g_b),

(3) g_b\sim\text{Scale-inv-}\chi^2(1,\ h_b^2),

(4) t_{i}\mid g_t\ \overset{\text{ind.}}{\sim}\mathcal{N}(0,\ g_t),

(5) g_t\sim\text{Scale-inv-}\chi^2(1,\ h_t^2).

My current stan model file is

data {
int<lower=1> N;        // number of subjects
int<lower=1> C;        // number of conditions
vector[C] Y[N];        // responses
real tcrit;            // critical value
real<lower=0> ht;      // scale parameter of the standardized treatment effects
real<lower=0> hb;      // scale parameter of the standardized subject-specific random effects
}

parameters {
real mu;                           // grand mean
vector<lower=0>[C] sigma_squared;  // error variances
real<lower=0> gt;                  // variances of the standardized treatment effects
real<lower=0> gb;                  // variance of the standardized subject-specific random effects
vector[C] t;                       // treatment effects
vector[N] b;                       // subject-specific random effects
}

transformed parameters {
vector<lower=0>[C] sigma;  // error sd
real<lower=0> va;          // average error variance
vector<lower=0>[C] eta;    // sd of the treatment effects
real<lower=0> tau;         // sd of the subject-specific random effects
sigma = sqrt(sigma_squared);
va = sum(sigma_squared) / C;
eta = sigma * sqrt(gt);
tau = sqrt(va * gb);
}

model {
// linear mixed-effects model
for (i in 1:N) {
Y[i] ~ normal(mu + t + b[i], sigma);
}
t ~ normal(0, eta);
b ~ normal(0, tau);

// priors
// mu ~ implicit uniform prior           // Jeffreys prior
target += -2 * sum(log(sigma));          // Jeffreys prior
gt ~ scaled_inv_chi_square(1, ht);
gb ~ scaled_inv_chi_square(1, hb);
}

// compute HDI boundaries
generated quantities {
vector<lower=0>[C] se;
matrix[C,2] hdi_se;
vector[C] mu_t;  // condition means

se = sigma / sqrt(N);
mu_t = mu + t;
hdi_se[,1] = mu_t - tcrit * se;  // lower bound
hdi_se[,2] = mu_t + tcrit * se;  // upper bound
}


The data example is

mat <- matrix(c(10,6,11,22,16,15,1,12,9,8,
13,8,14,23,18,17,1,15,12,9,
13,8,14,25,20,17,4,17,12,12), ncol = 3,
dimnames = list(paste("s", c(1:10), sep=""),
c("Level1", "Level2", "Level3")))
N <- nrow(mat)
C <- ncol(mat)
alpha <- 0.05
tcrit <- qt(1 - alpha/2, df = N - 1)
datlist <- list(Y = mat, C = C, N = N, tcrit = tcrit, ht = 1, hb = 1)


The current sampling results return warnings such as

"There were 249 divergent transitions after warmup. See ~~
to find out why this is a problem and how to eliminate them. "

My guess is that the Jefrreys prior for error variance is nor correctly specified, i.e., target += -2 * sum(log(sigma)); for \pi(\mu,\ \sigma_1^2,\dotsb,\sigma_a^2)\varpropto\prod_{i=1}^{a}{\sigma_i^{-2}}.

Any suggestion?

Thank you for the kind comments. Best.

I’m not familiar with Jeffreys Priors, but I wonder if you need to play with alternative your “parameterization” of the variable t. Specifically, you have implemented an unusual (in the sense that it’s rare to see, not that it might not make sense) mixed parameterization whereby the location is non-centered but the scale is centered. I don’t know under what circumstances such mixed parameterization might make sense, but for the general centered/non-centered distinction, see here. Generally, when the data are highly informative, centered tend to sample better and when the data are less informative, non-centered tend to sample better.

To try location-and-scale-centered, you’d do:

  for (i in 1:N) {
Y[i] ~ normal(t + b[i], sigma);
}
t ~ normal(mu, eta);


And to try location-and-scale-non-centered, you’d do:

  for (i in 1:N) {
Y[i] ~ normal(mu + t*eta + b[i], sigma);
}
t ~ std_normal() ;