How do I write the Jeffreys prior for error variance (heteroscedasticity)?

I need to model the Jeffreys prior for error variance in a heteroscedastic ANOVA design in rstan . The linear mixed-effects model is Y_{ij}=\mu+\sigma_i t_i+\sigma b_j+\epsilon_{ij},\ \ \ \ \epsilon_{ij}\overset{\text{ind.}}{\sim}\mathcal{N}(0,\ \sigma_i^2)\ \ \ \ \text{for } i=1,\dotsb,a;\ j=1,\dotsb,n,

where \mu is the grand mean, t_i is the standardized treatment effects, b_j is the standardized subject-specific random effects, a is the number of levels, n is the number of subjects. The heterogeneity of variance means that the error variance \sigma_i^2 is not constant across the different levels of the treatment.

\sigma^2=\frac{1}{a}\sum_{i=1}^{a}\sigma_i^2

The priors are (1) \pi(\mu,\ \sigma_1^2,\dotsb,\sigma_a^2)\varpropto\prod_{i=1}^{a}{\sigma_i^{-2}},

(2) b_{j}\mid g_b\ \overset{\text{ind.}}{\sim}\mathcal{N}(0,\ g_b),

(3) g_b\sim\text{Scale-inv-}\chi^2(1,\ h_b^2),

(4) t_{i}\mid g_t\ \overset{\text{ind.}}{\sim}\mathcal{N}(0,\ g_t),

(5) g_t\sim\text{Scale-inv-}\chi^2(1,\ h_t^2).

My current stan model file is

data {
  int<lower=1> N;        // number of subjects
  int<lower=1> C;        // number of conditions
  vector[C] Y[N];        // responses
  real tcrit;            // critical value
  real<lower=0> ht;      // scale parameter of the standardized treatment effects
  real<lower=0> hb;      // scale parameter of the standardized subject-specific random effects
}

parameters {
  real mu;                           // grand mean
  vector<lower=0>[C] sigma_squared;  // error variances
  real<lower=0> gt;                  // variances of the standardized treatment effects
  real<lower=0> gb;                  // variance of the standardized subject-specific random effects
  vector[C] t;                       // treatment effects
  vector[N] b;                       // subject-specific random effects
}

transformed parameters {
  vector<lower=0>[C] sigma;  // error sd
  real<lower=0> va;          // average error variance
  vector<lower=0>[C] eta;    // sd of the treatment effects
  real<lower=0> tau;         // sd of the subject-specific random effects
  sigma = sqrt(sigma_squared);
  va = sum(sigma_squared) / C;
  eta = sigma * sqrt(gt);
  tau = sqrt(va * gb);
}

model {
  // linear mixed-effects model
  for (i in 1:N) {
    Y[i] ~ normal(mu + t + b[i], sigma);
  }
  t ~ normal(0, eta);
  b ~ normal(0, tau);

  // priors
  // mu ~ implicit uniform prior           // Jeffreys prior
  target += -2 * sum(log(sigma));          // Jeffreys prior
  gt ~ scaled_inv_chi_square(1, ht);
  gb ~ scaled_inv_chi_square(1, hb); 
}

// compute HDI boundaries
generated quantities {
  vector<lower=0>[C] se;
  matrix[C,2] hdi_se;
  vector[C] mu_t;  // condition means

  se = sigma / sqrt(N);
  mu_t = mu + t;
  hdi_se[,1] = mu_t - tcrit * se;  // lower bound
  hdi_se[,2] = mu_t + tcrit * se;  // upper bound
}

The data example is

mat <- matrix(c(10,6,11,22,16,15,1,12,9,8,
                13,8,14,23,18,17,1,15,12,9,
                13,8,14,25,20,17,4,17,12,12), ncol = 3,
               dimnames = list(paste("s", c(1:10), sep=""),
                               c("Level1", "Level2", "Level3")))
N <- nrow(mat)
C <- ncol(mat)
alpha <- 0.05
tcrit <- qt(1 - alpha/2, df = N - 1)
datlist <- list(Y = mat, C = C, N = N, tcrit = tcrit, ht = 1, hb = 1)

The current sampling results return warnings such as

"There were 249 divergent transitions after warmup. See ~~
to find out why this is a problem and how to eliminate them. "

My guess is that the Jefrreys prior for error variance is nor correctly specified, i.e., target += -2 * sum(log(sigma)); for \pi(\mu,\ \sigma_1^2,\dotsb,\sigma_a^2)\varpropto\prod_{i=1}^{a}{\sigma_i^{-2}}.

Any suggestion?

Thank you for the kind comments. Best.

I’m not familiar with Jeffreys Priors, but I wonder if you need to play with alternative your “parameterization” of the variable t. Specifically, you have implemented an unusual (in the sense that it’s rare to see, not that it might not make sense) mixed parameterization whereby the location is non-centered but the scale is centered. I don’t know under what circumstances such mixed parameterization might make sense, but for the general centered/non-centered distinction, see here. Generally, when the data are highly informative, centered tend to sample better and when the data are less informative, non-centered tend to sample better.

To try location-and-scale-centered, you’d do:

  for (i in 1:N) {
    Y[i] ~ normal(t + b[i], sigma);
  }
  t ~ normal(mu, eta);

And to try location-and-scale-non-centered, you’d do:

  for (i in 1:N) {
    Y[i] ~ normal(mu + t*eta + b[i], sigma);
  }
  t ~ std_normal() ;