Model comparison between independent normals and multivariate normals

Hi everyone, I have a question regarding ways in which I can compare two models, to address whether introducing a correlation structure improves the performance. For the sake of exposition I will simplify them a lot, but let’s assume they are the following:


model {
  y ~ normal(mu, sigma); 
} 

and

model {
  y ~ multi_normal_cholesky(mu, L); 
} 

Where y is a vector of dimension N. I wanted to use the loo package to compare the two models, but I quickly realise (also confirmed on this post) that it is in fact not possible. What other alternatives do you suggest? Thank you in advance for your answers.

That answer you linked seems to be misleading. See equations and code for loo for multivariate normal in Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models, and more background in
Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models.

For cross-validation of y to make sense, there need to be repeated y, e.g.,

data {
   array[N] vector[K] y;
}

In this case, log_lik needs to contain the log likelihood of each of the N values for y[n]. The code will look like this:

generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(y[n] | mu, sigma);  // for model 1
    // log_lik[n] = multi_normal_cholesky_lpdf(y[n] | mu, L);  // for model 2
  }
}

You want the block with the first line for the first model and the second line for the second model.

No, they don’t need to be repeated like that. See the papers I linked.

Hi @avehtari and @Bob_Carpenter, thank you both very much for your help. I checked out the references and had already found this useful post that was sharing the result for correlated multinormal observations. I wanted to ask @avehtari if:

  1. the code that @Bob_Carpenter gave, the sort of “naive” approach, is in fact correct and effcient
  2. if you had an example of code implementing the result from your paper for the normal correlated model

:-).

It’s as efficient as you’re going to be able to write this in Stan. Our normal_lpdf function isn’t configured to return vectors—it reduces by summing, which streamlines autodiff.

The generated quantities calculations are much much cheaper than the model block evaluations because (a) they only get evaluated once per iteration, not once per leapfrog integrator step, and (b) they work with C++ primitives (double and int) rather than autodiff variables. So it’s faster to keep the model block vectorized then repeat the work down in the generated quantities block.

One might think it’d be more efficient to move the generated quantities block to transformed parameters and then have the likelihood defined as follows:

transformed parameters {
vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(y[n] | mu, sigma);
  }
}
model {
  target += sum(log_lik);
}

This is actually less efficient than just taking the model block to be

y ~ normal(mu, sigma);

and redefining the log likelihood in generated quantities.

I just realized we didn’t put the LOO-friendly version in the User’s Guide. I’ve been reluctant to add code that isn’t useful in all of our interfaces, but I think LOO’s also available in ArviZ, so maybe I should add a note reiterating this.

Doesn’t something need to be repeated? I don’t see how we can do leave-one-out cross-validation on a set containing only a single element.

I just looked at the Gaussian paper you linked and the very first equation (1) has a product over what I am calling N (but the paper calls n). Then the definition of LOO in (4) has a matching loop.

How do I cross-validate over y when I only have the following?

data {
  vector[N] y;
}
model {
  y ~ multi_normal(mu, Sigma);
}

Are you doing something like leaving out one of the components of y and trying to predict from others given Sigma? But then I’m not even sure how you fit mu or Sigma in a setup like this with only a single observation—it’s going to be heavily prior driven.

If N is 1, then no cross-validation. If N>1 then cross-validation is possible by leaving out elements of the vector y.

In Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models | Computational Statistics, we call this non-factorized. Often it could be factorized by conditioning y on latent values, which then have a multivariate distribution (see Figure 1 for different model possibilities). Sometimes that factorization is not possible or it would lead to problems in computation. The LOO mean and sd are given by equations (5) and (6) (see also (8) and (9), and the LOO log predictive density is given by equation (7). One of the reviewers wrote that this is so trivial that everyone knows this, but thanks for being additional evidence that there was need to write this paper.

It’s correct, but answering a different question than what you asked in your first post. You stated “Where y is a vector of dimension N.” and Bob answered “y is N vectors of dimension K”.

The code available is using brms and R GitHub - paul-buerkner/psis-non-factorized-paper: The repository contains all the materials related to the paper "Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models".. We don’t have pure Stan code, but I hope you can figure out how to write the equations (7), (8), and (9) in Stan code. I would be happy to write an example Stan code for this, but I don’t have time for that, at least for next four weeks (I might be away from Stan discourse also next four weeks)

Thanks, @avehtari. I was guessing it’s because you were thinking of this as non-factorized. I finally clicked through to the @jsocolar post that was confusing, and that seems to indicate that @Tommaso11397 has a single multivariate y rather than repeated multivariate y. I think the confusion from @jsocolar is that with multivariate normal, you can work out the joint density one element at a time because the conditionals are simple due to the normal’s lovely marginalization properties. If you only have one multivariate normal observation y ~ multinormal(mu, Sigma), then you have to be able to work out the conditional distribution of y[n] given y[1:n-1]. At least that’s easy with multivariate normals—you just chop down the covariance to the pieces you have. Aki’s paper cites an even more clever way to do this that avoids having to factor each of the sub-matrices independently.

Will this not give different answers depending on the ordering? I didn’t see this discussed in the paper and don’t know enough about normals to work this out in my head.

The ordering does not matter for LOO equations

My confused post was just that—confused. At the time I was either unaware of Bürkner, Gabry & Vehtari’s non-factorized LOO solution or else maybe I was unaware that their method still proceeds by forming the pointwise log-likelihood matrix. I’ll try to add a clarifying update on the other thread.

Thanks, @jsocolar. I was also confused by the question as I wasn’t expecting cross-validation of the values of a vector rather than multiple vectors! Good thing @avehtari’s here :-).

1 Like