Cholesky decomposition

Hi,

Can somebody advice which cholesky decomposition is more effective in stan? Can models be optimized even more?

Also, which model, if any, falls under centered/noncentered parametrization? I would like to grasp the terms how they are applied in multidimensional case.

Linas
chol1.stan (1.4 KB)

chol2.stan (1.4 KB)

1 Like

Also, which model, if any, falls under centered/noncentered parametrization?

The centered parameterization for a multivariate normal is when you do:

y ~ multi_normal(zero, Sigma);

The non-centered is when you do:

z ~ normal(0, 1);
y = cholesky_decompose(Sigma) * z;

The specifics of the naming I dunno. I always go to Betancourt’s divergences case study if I get confused about this stuff: Diagnosing Biased Inference with Divergences

Can somebody advice which cholesky decomposition is more effective in stan?

I think hierarchical models without much data → divergences. I think there’s a data dependence there. Easiest way to figure out which parameterization works better for you is to try them both, but if your centered parameterization is running fine (no divergences and diagnostics looking good), I don’t think think there’s a huge call to switch. I might be wrong.

The non-centered parameterizations always trip me up for a bit when I’m reading models, so I prefer to keep them centered for readability if I can.

cholesky decomposition

I think you’re using your Choleskys here correctly.

Can models be optimized even more?

sigma = 2.5 * tan(sigma_unif); // sigma ~ cauchy(0, 2.5)
for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]); // tau ~ cauchy(0, 2.5)

Why not just use the cauchy syntax here? I’m not sure with the constraints on tau and sigma this is totally right.

And cauchy priors aren’t all roses (Asymmetric Gaussian Hierarchical Model - #11 by bgoodri). They’re serious when they talk about those heavy tails :P (draw some numbers from cauchy(0.0, 1.0) and just look at them).

2 Likes

Ignore my comment on the tan thing vs. cauchy. Had the usefulness of that explained to me in another thread ^^.

This is fine when presenting them, but coding them in Stan (or in BUGS/JAGS) should be based on data size and specificity. With little data, you need the non-centered parameterization in order to draw unbiased samples from the posterior—the funnel-shaped posterior you get from the centered parameterization will defeat Euclidean HMC, Gibbs, and Metropolis.

Is there some definition of specificity? Basically I am looking for a rule
of thumb when centered works better and when non-centered works better. So
far I am hearing that if sample size is small then non-centered is
recommended while sample size is large centered is recommended. Are there
some quantitative definitions of large/small/specific/non-specific?

Linas

Check out Bentacourt and Girolami’s paper. There’s an arXiv version. They plot curves based on posterior standard deviation vs. breakeven point.

Hey, could you link me the thread where tan vs cauchy difference was explained to you? I have the same question.

I forget where, but check “Reparameterizing the Cauchy”, page 339 of the manual.