Identification of Matrix Normal Distribution

Hi,

I have a question about imposing identification constraints on the matrix normal covariances. In fitting the multi-output Gaussian process, it is recommended to fix \alpha to unity.

We should set α to 1.0 because the parameter is not identified unless we constrain
trace(C) = 1 .
https://mc-stan.org/docs/2_27/stan-users-guide/fit-gp-section.html.

I will like to find out if there’s a justification for this, an article I can cite. This is because, articles I have reviewed suggest either fixing the trace of a covariance matrix to the the length of the diagonals, or fixing the first entry of a covariance matrix to 1. For example, https://arxiv.org/pdf/1703.08882.pdf.

Thanks

1 Like

I don’t understand those models very well, but since nobody else answered, I will give it a try:

I would expect all of the constraints to have the same goal: remove one degree of freedom from the model to make it identified. So in principle, it is not that important which one you choose.

I think the practical appeal of constraining \alpha rather than the trace or a specific element of the C matrix is that in Stan one can get a big speedup by working with the Cholesky decomposition of the covariance (or correlation) matrix, which is available as a built-in type. Not constraining the C matrix let’s us use this type directly. If we constrained C, we would need to a) think how to parametrize such constrained matrix and b) implement the relevant transforms and Jacobians. I however don’t have any citation to an article (maybe you can cite the Stan User’s guide - it is basically a book :-)

Additionally, the length-scale parameter is often the most problematic to fit well (see e.g. https://betanalpha.github.io/assets/case_studies/gaussian_processes.html#32_Exploring_the_Marginal_Likelihood_Function) so avoiding fitting it likely let’s you sidestep some of those issues.

Best of luck with your model!

Thanks very much, Martin! This explanation helps me a lot.