Convert correlation matrix to non-centered parameterization?

loo

#1

The attached model2.stan (2.8 KB) sometimes has divergent transitions (depending on the seed). The way I’ve tried to address this is to increase lkj’s eta parameter. I started with lkj_corr_cholesky(2), but that had lots of divergent transitions. Currently, I have lkj_corr_cholesky(5). There is not much data in the model so I want to prefer correlations of zero, but I don’t want to overdo it with too strong a prior. I guess I could try increasing to 6. The thing is, I feel dubious about setting the eta parameter high enough to avoid divergent transitions. Is that a defensible way to set a prior?

Or can I improve sampling by using a non-centered parameterization? I’ve looked at examples of non-centered models, but so far, none of them quite match what I’m trying to do. How could I convert my model to be non-centered? Or does that even make sense for a correlation matrix?

Here are the scripts and data needed to run the model:

f4.R (394 Bytes)
modelUtil.R (5.9 KB)
rawData.csv (75.3 KB)

(Download everything, place all files in the same directory, and execute f4.R.)

Thank you.


#2

There’s an example of a multivariate non-centering in the manual in the optimization chapter. There’s a section on non-centered parameterizations, which has a subsection on the multivariate case.

I don’t think an lkj_correlation(6) prior is that strong, but I could be wrong. The thing to do if you’re worried is test prior sensitivity. Try it with an eta of 6 and with an eta of 1 (even if there are divergences) and see if the priors look different enough to care. Usually we impose weakly informative priors that don’t have much impact on the posterior.


#3
parameters {
  vector[NFACETS]     theta_raw[NPA];    // latent score of PA by facet
  real threshold1;
  real threshold2;
  vector<lower=0>[NFACETS] alpha;
  cholesky_factor_corr[NFACETS] thetaCorChol;
}


transformed parameters {
  for (pa in 1:NPA) {
    theta[pa] = thetaCorChol * theta_raw[pa];
  }
}

model {
  thetaCorChol ~ lkj_corr_cholesky(5);
  for (pa in 1:NPA) {
    theta_raw[pa] ~ normal(0, 1);
  }

Changed your code to non-centered.
You don’t have any hyper-prior variance restrictions on your multivariate normal distribution.
I’d do that too, use something more restrictive than cauchy(0, 1); first.


#4

Thank you very much for providing the translation. I have tried to read through the manual section 28.6
Reparameterization as Bob recommended, but it is much easier to see the transformation applied to my own code with which I am intimately familiar.

The way I set up the model, the variances are fixed to 1 and the measurement error is reflected by the alpha parameters like a 2PL item response model. Upon further consideration, I think this is the wrong way to go because the alpha values are hard to interpret. I plan to fix alpha to 1.0 and free the variances. This should be an equivalent model, but I anticipate that it will be easier to interpret. Once I switch things around, then yes, I will add priors for the variances. Did I understand your comment correctly?


#5

To fix the variance to 1.0 is okay, when you know a-priori that they are 1.0.

You might print a summary statistic of your model and look which parameters have low Neff and compare both ways with the loo package. Do a pairs plot about correlations and funnels and traceplot to check the mixing.
I think there is no a-prior way of right and wrong. I change my models various times to see what’s work and what not.


#6

@Bob_Carpenter It seems unintuitive that the transformed parameters block goes before the model block, in this case. I know that there is some rationale for the block order (see the manual 6.1. Overview of Stan’s Program Blocks). However, I think the funky ordering was the main thing that confused me about how to transform the model to the non-centered parameterization. Why can’t theta go in the generated quantities block instead? Oh, because theta is used in the model block later on in the code?

rcat[cmp,ff] ~ categorical_logit(
cmp_probs(alpha[ff],
theta[pa1[cmp],ff],
theta[pa2[cmp],ff],
threshold1, threshold2));


#7

Think about the transformed parameters block as public
and the model block as private. Public things are recorded while sampling.
Theta is going to used in model and generated quantities. If you define theta in the model
block, it is private and cannot be used in the generated quantities block.
Additionally the model block contains sampling statements.


#8

I’ve tested out the non-centered stuff. Wow, sampling works so much better. Awesome.


#9

Exactly. The design motivation is that the parameters block defines what is actually be sampled by NUTS. Then the transformed parameters block has transforms of those parameters. Both the parameters and transformed parameters may then be used in the models (though the transformed parameters need Jacobians if they’re used on the left side of sampling statements and the transform is nonlinear).

We’re looking into completely rethinking all the blocks and everything, but for now, that order’s required.

The bad part is that sometimes you want a local variable to define a transformed parameter and also want that local variable in the model block or generated quantities block. Stan doesn’t let you do that because it currently tangles the output with where something is declared.