What is the point of the multi_normal_cholesky parametrization?

Under all conditions. By starting with a pre-factored covariance matrix, the evaluation is quadratic in dimension rather than cubic. It saves on both the determinant calculation (because Cholesky is triangular, it’s only diagonal) and the quadratic form that involves the inverse of the covariance matrix. The autodiff is similarly sped up because it follows the basic evaluation.

There’s even more advantage within Stan because the way we parameterize a dense covariance matrix is using a Cholesky factor under the hood (N choose 2 unconstrained elements below the diagonal, and N diagonal elements which must be positive, so they’re log transformed). So it saves a lot of work in just creating a well-formed covariance matrix.

We should probably hint that it’s both more numerically stable and more efficient. We go into that fairly early in the User’s Guide in the regression chapter.