Question about bulk and tail ESS

I am using a translated and scaled simplex as described in section 1.7 of the Stan Users Guide (v2.24) to center coefficients in a relatively simple multivariate response model. Specifically,

parameters {
  ...
  simplex[N_forms] beta_raw[N_trait];
  vector[N_trait] beta_scale;

}

transformed parameters {
  vector[N_trait] mu_form[N_forms];

  for (i in 1:N_trait) {
    for (j in 1:N_forms) {
      mu_form[j][i] = beta_scale[i]*(beta_raw[i][j] - 1.0/N_forms);
    }
  }
  ...
}

model {
  ...
  for (i in 1:N_trait) {
    beta_raw[i] ~ dirichlet(one);
    beta_scale[i] ~ normal(0.0, 1.0);
  }
  ...
}

I am combine mu_form with another linear term to model multivariate mean vectors, but what I’m really interested in is the covariance/correlation matrix associated with those vectors.

My code runs fine, and the diagnostics look good, except that I get a warning about a small bulk and tail ESS. When I examine bulk and tail ESS for each of the parameters in the model, I discover that the warning is because the bulk and tail ESS for beta_raw and beta_scale are very small, i.e., 6-20. The bulk and tail ESS for mu_form are a bit small (< 300), and the bulk and tail ESS for the other linear term is also small (150 or so).

BUT the bulk and tail ESS for all of the parameters I’m interested in are all > 400.

Do I need to worry about the small bulk and tail ESS for my “nuisance” parameters, or am I safe to ignore them? My understanding is that the mean/median and quantiles from my “nuisance” parameters may be unreliable, but since I’m not interested in estimating them, do I need to worry about the warning?

Kent

You should worry about them, unfortunately. Even if these are the nuisance parameters, you’re still integrating out so you want to sample them well.

I just went over to have a look at section 1.7. There are a bunch of other ways to do a sum to zero. It’s probably worth trying those other parameterizations because they might behave quite a bit differently.

Damn! I was worried that was probably the case. I’ll try one of the other formulations and hope that I have better luck.

Thank you for the quick response.

Kent

Quick update: Contrast coding seems to work pretty well.

parameters {
  ...
  vector[N_forms-1] beta_raw[N_trait];
  ...
}

transformed parameters {
  vector[N_trait] mu_form[N_forms];

  for (i in 1:N_trait) {
    for (j in 1:(N_forms-1)) {
      mu_form[j][i] = beta_raw[i][j];
    }
    mu_form[N_forms][i] = 0.0;
  }
  ...
}

model {
  ...
  for (i in 1:N_trait) {
    beta_raw[i] ~ normal(0.0, 1.0/sqrt(N_forms));
  }
  ...
}

No warnings about low bulk or tail ESS for any of the parameters. Given how well that seems to work, I think I’ll try the K-1 degrees of freedom approach and see whether it’s the sum-to-zero constraint that’s the problem. (Sum-to-zero is a bit easier to interpret in my application.)

Kent

4 Likes

Further update: Sum to zero using the K-1 degrees of freedom approach works just fine. Is there any reason that K-1 would general work better than Dirichlet, or is it specific to the data/problem?

Kent

2 Likes

I’m not sure, but that sounds right. That would at least explain why there are there so many versions of this in the manual.

1 Like