Sampling over an uncertain variance covariance matrix

Hi all,

I’ve been fiddling with phylogenetic mixed models from evolutionary biology recently. In these models, you have a “random” effect featuring a VCV (derived from an estimated phylogeny) that’s known up to a constant. They’re mechanically identical to animal models (of which Diogo Melo has a Stan example here: https://github.com/diogro/stanAnimal/blob/master/animalModelUni.stan) Because the phylogeny itself is an upstream estimand and phylogenies are commonly estimated using Bayesian methods (i.e. BEAST, MrBayes), you end up with a posterior distribution of phylogenetic trees that capture the uncertainty in your phylogenetic estimate. Therefore, you implicitly have a posterior distribution of VCVs, and it would be nice to include that uncertainty in downstream analyses.

You can do that in BUGS (described here: https://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-12-102 with code in the appendix), but I was wondering whether there was an equivalent way to average over this phylogenetic uncertainty in Stan. I’ve skimmed through the manual, but I couldn’t spot anything. Clearly an array of matrices is possible, but I couldn’t work out a way of sampling over them.

2 Likes

I recently posted an approach I have used here: Non-static covariance matrices?

Thanks. I’ll have a look at that later and see if I can get my head around it.

You talked about covariance matrices, but I’m not sure what you want to average and where. In Stan, you define a density, then you can define derived quantities.

If it’s something you can do in BUGS then we can probably do it in Stan, too, unless it involves their cut operation or combinatorially intractable discrete parameters.

Often if you’re doing inference over discrete structures like trees it’s going to be the latter and Stan won’t be able to fit it at all (but then neither will BUGS be able to navigate that combinatorial landscape with discrete sampling in most cases).

I couldn’t find the appendix to that paper to check out the BUGS code.

what @Bob_Carpenter said, but for phylogenetic trees you could just run the Stan model once per sample from beast/whatever. Sure it burns cpu time but it should be fine on a cluster. If the results are consistent you don’t have to worry about it.

I’ve previously been running these analyses in MCMCglmm, and just running it repeatedly over different posterior samples from the BEAST output is how I’ve been doing it. I can certainly get that to run in Stan, I was just wondering if there was a more efficient way of doing it.

@Bob_Carpenter I wouldn’t be surprised if the discreteness is an issue, it’s part of the reason that I asked. I’ll try explaining what I’m trying to get Stan to do again, as I obviously wasn’t very clear the first time.

Upsteam a phylogenetic tree is generated. This gives a posterior distribution of k trees. In evolutionary biology there are several model based methods to map from a phylogenetic tree to a covariance matrix between n species under that particularly evolutionary model. Because the model is tree specific you can apply it to all of the trees in the posterior to get a k x n x n matrix array, which captures the uncertainty in the covariance between the species traits driven by the uncertainty in the estimated tree. I think talking about averaging over this uncertainty was possibly the wrong terminology, but essentially I want to account for it.

But looking in more detail at the BUGS code, which is here (a direct link the appendix this time: https://static-content.springer.com/esm/art%3A10.1186%2F1471-2148-12-102/MediaObjects/12862_2012_2257_MOESM1_ESM.pdf), it does look like it’s doing direct categorical sampling, so Stan probably just can’t fit it. Thanks anyway though. As @sakrejda says, running it over a cluster on random samples from the BEAST posterior isn’t too much of a hassle!

1 Like

You can code this model in Stan and it should proably work a lot better than trying to sample the K. Here’s the BUGS:

Y[1:Nspec] ~ dmnorm(mu[],TAU[,])
...
K ~ dcat(p[])
for (i in 1:Nspec) {
  for (j in 1:Nspec) {
    TAU[i,j] <- tau*invA[i,j,K]
  }
}

Here’s what you’d do in Stan to marginalize out the K:

vector[K] lp;
for (k in 1:K)
  lp[k] = multi_normal_lpdf(y | mu, tau * invA[ , , k]);
target += log_sum_exp(lp);

It’d be more efficient if you coded invA as:

matrix[N, N] invA[K];

and then can use invA[k] rather than invA[ , , k], which won’t require a copy.

If you want the probability of each k in each iteration, save the lp values, and calculate the probability of k as lp[k] - log_sum_exp(lp) (that’s just the log-scale form of p[k] / sum(p)). Stan has a convenient built in, so if you really really want to sample that k, (they write everyting in caps, but I’ve converted to our conventions), you’d have

generated quantities {
  int k = categorical_logit_rng(lp);

but you’re better off working in (log) expectation with lp - log_sum_exp(lp) as I explain in the first example in the manual chapter on latent discrete parameters.

4 Likes

That’s amazing. Thanks very much.

With this thread being brought up again here, I’ve been rereading this answer, and I have a quick question that might make me look like an idiot. But in the spirit of it being better to temporarily look like an idiot in order to learn something than to not and remain confused, I’ll ask it anyway. It’s about this:

Here, as I understand it,

is just the log sum of the likelihoods for each datapoint under the given multivariate normal distribution with means mu and scaled phylogenetic covariance matrix tau*invA[,,k].

So in very simple terms, is that just sum(log(likelihood))? If so, what I’m not 100% clear on, and I wanted to check is why, in the marginalisation step here:

each element of lp which, as I as I understand it is a sum(log(likelihood)) term under the k th covariance matrix is being exponentiated before being logged and summed again.

In section 15.2 of the Stan manual, there’s an example, which, as far as I can tell, is exactly analogous, but there you have an additional factor of -log(K) (actually T in the example, but to keep notation straight I’ll keep with K) from the uniform prior across each value of the discrete parameter. I don’t fully understand where that’s going in this case, as there should still be an implicit uniform prior across each covariance matrix…

Why is lp[k] = multi_normal_lpdf(y | mu, tau * invA[,,k]) rather than lp[k] = -log(K) + multi_normal_lpdf(y | mu, tau * invA[,,k])?

1 Like

You don’t need the -log(K) if it’s constant.

If y is an array of multivariate observations and you want to mix at the observation level (which is the usual thing to do), then that’d be:

vector[K] lp;
for (n in 1:N) {
  for (k in 1:K)
    lp[k] = multi_normal_lpdf(y[n] | mu, tau * invA[ , , k]); 
  target += log_sum_exp(lp);
}

If I hadn’t used that strategy initially on our bboards and in person, I’d know almost nothing about stats!

2 Likes

Thanks again!