Constraining the trace of a covariance matrix


#1

I am interested in an inference problem where the posterior distribution is over positive semi-definite matrices whose trace is equal to 1. I have not begun working on this, but before I do, I would be interested to hear any opinions from those experienced users about how best to implement this, as in, what to put in the parameters section of the stan program. I understand there are constrained native types for cholesky factors, correlations, and covariance matrices.

I once wrote pair of inverse functions that transform between \mathbb{R}^{d^2-1} and the set of d\times d unit-trace positive semi-definite operators with the constraint based on stick-breaking (see here).
Would implementing this in stan be a useful approach, or is there a simpler way? Here is a blurb copied from the docstring of vec_to_state():

The unit trace condition is met by using a stick-breaking like trick, where
a given value of x doesn’t directly store a cholesky factor matrix element,
but rather, it stores (after being shrunk) the fraction of the remaining
budget of the trace until 1. Note that the trace of a positive operator
is the square Hilbert-Schmidt norm of its cholesky factor, so this is a
2-norm version of stick breaking.

Thanks for any help.


#2

Could you just define the square of the scale vector to be a simplex, give it a Dirichlet prior, and proceed as normal?

Something like

data {
  int P;
}
transformed data {
  vector[P] alpha = rep_vector(1.0/P, P);
}
parameters {
  simplex[P] sqr_scale;
  corr_matrix[P] Omega;
}
transformed parameters {
  matrix[P, P] Sigma = quad_form_diag(Omega, sqrt(sqr_scale));
}
model {
  
  target += dirichlet_lpdf(sqr_scale  | alpha);
  target += lkj_corr_lpdf(Omega | 3);
}

In R then

compiled_model <- stan_model("trace_sums_to_1.stan")

fit <- sampling(compiled_model, data = list(P = 5), chains = 1)

sum(diag(matrix(unlist(as.data.frame(fit, pars = "Sigma")[1000,]), 5, 5)))

> 1

This puts no weight on scales of 1 or 0, which might not be desirable.

Cheers,
Jim


#3

Thanks, this is exactly what I am looking for! (though I will need to muck around a bit to simulate complex matrices).


#4

That’s exactly what we do with correlation matrices under the hood, by the way. The full definition with Jacobian is in the manual in the chapter on constrained parameter transforms.

Neat!


#5

That’s exactly what we do with correlation matrices under the hood, by the way.

Cool. It seems the main difference is that cholesky stick-breaking operates per-row, while trace constraining stick-breaking operates on all entries together.

While reading through, I found a minor typo in the subsection “Correlation Matrix Inverse Transform”: It reads that tanh maps into (0,1), rather than (-1,1).


#6

Nice, Bob keeps an issue open for changes like this: https://github.com/stan-dev/stan/issues/2552 , so go ahead and post it there!


#7

You’re right. I’ll fix for next release.

Thanks for following up with an issue, @ihincks—that way, it won’t get lost.