Constraining the trace of a covariance matrix


I am interested in an inference problem where the posterior distribution is over positive semi-definite matrices whose trace is equal to 1. I have not begun working on this, but before I do, I would be interested to hear any opinions from those experienced users about how best to implement this, as in, what to put in the parameters section of the stan program. I understand there are constrained native types for cholesky factors, correlations, and covariance matrices.

I once wrote pair of inverse functions that transform between \mathbb{R}^{d^2-1} and the set of d\times d unit-trace positive semi-definite operators with the constraint based on stick-breaking (see here).
Would implementing this in stan be a useful approach, or is there a simpler way? Here is a blurb copied from the docstring of vec_to_state():

The unit trace condition is met by using a stick-breaking like trick, where
a given value of x doesn’t directly store a cholesky factor matrix element,
but rather, it stores (after being shrunk) the fraction of the remaining
budget of the trace until 1. Note that the trace of a positive operator
is the square Hilbert-Schmidt norm of its cholesky factor, so this is a
2-norm version of stick breaking.

Thanks for any help.


Could you just define the square of the scale vector to be a simplex, give it a Dirichlet prior, and proceed as normal?

Something like

data {
  int P;
transformed data {
  vector[P] alpha = rep_vector(1.0/P, P);
parameters {
  simplex[P] sqr_scale;
  corr_matrix[P] Omega;
transformed parameters {
  matrix[P, P] Sigma = quad_form_diag(Omega, sqrt(sqr_scale));
model {
  target += dirichlet_lpdf(sqr_scale  | alpha);
  target += lkj_corr_lpdf(Omega | 3);

In R then

compiled_model <- stan_model("trace_sums_to_1.stan")

fit <- sampling(compiled_model, data = list(P = 5), chains = 1)

sum(diag(matrix(unlist(, pars = "Sigma")[1000,]), 5, 5)))

> 1

This puts no weight on scales of 1 or 0, which might not be desirable.



Thanks, this is exactly what I am looking for! (though I will need to muck around a bit to simulate complex matrices).


That’s exactly what we do with correlation matrices under the hood, by the way. The full definition with Jacobian is in the manual in the chapter on constrained parameter transforms.



That’s exactly what we do with correlation matrices under the hood, by the way.

Cool. It seems the main difference is that cholesky stick-breaking operates per-row, while trace constraining stick-breaking operates on all entries together.

While reading through, I found a minor typo in the subsection “Correlation Matrix Inverse Transform”: It reads that tanh maps into (0,1), rather than (-1,1).


Nice, Bob keeps an issue open for changes like this: , so go ahead and post it there!


You’re right. I’ll fix for next release.

Thanks for following up with an issue, @ihincks—that way, it won’t get lost.