Estimating covariance terms in multilevel model

I constructed a three level multilevel model in stan using the non-centered parameterization approach. In this way my random parameter beta_0jk[j] looks like the following:

transformed parameters  {
  // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  
  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
  }
}
model {
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);

  sigma_u0jk ~ student_t(3, 0, 10);
  sigma_u0k ~ student_t(3, 0, 10);
}

However, I also want to incorporate more random parameters, which I model just like this:

transformed parameters  {
  // Varying intercepts
  real beta_0jk[Npars];
  real beta_0k[Nk];
  real beta_1jk[Npars];
  real beta_1k[Nk];
  
  // Varying intercepts definition
  // Level-3 (10 level-3 random intercepts)
  for (k in 1:Nk) {
    beta_0k[k] = beta_0 + u_0k[k];
    beta_1k[k] = beta_1 + u_1k[k];
  }
  // Level-2 (100 level-2 random intercepts)
  for (j in 1:Npars) {
    beta_0jk[j] = beta_0k[countryLookup[j]] + u_0jk[j];
    beta_1jk[j] = beta_1k[countryLookup[j]] + u_1jk[j];
  }
}
model {
  // Random effects distribution
  u_0k  ~ normal(0, sigma_u0k);
  u_0jk ~ normal(0, sigma_u0jk);
  u_1k  ~ normal(0, sigma_u1k);
  u_1jk ~ normal(0, sigma_u1jk);

  sigma_u0jk ~ student_t(3, 0, 10);
  sigma_u0k ~ student_t(3, 0, 10);
  sigma_u1jk ~ student_t(3, 0, 10);
  sigma_u1k ~ student_t(3, 0, 10);
}

Here, beta_1jk is my additional random parameter. This works fine and I can estimate my model which gives fine results. However, I doubted whether I did not make a large simplification to my model by not modelling the covariances between u_0k and u_1k and between u_0jk and u_1jk. Does anyone know how I can model this covariance and whether it will make large impact on my model results?

Thanks in advance for the help!

Your prior on these parameters is independent, but that doesn’t mean your posterior will be. You can compute the posterior covariances on these parameters and see.

If you’re happy with your model, I don’t see any reason to go about putting covariance in your priors.

I was just thinking more about this. Jeez, I messed this question up, my bad.

Let’s start over :P.

So if your model is:

\beta_i \sim N(\mu, \sigma)

Then the betas are conditionally independent given \mu and \sigma. That doesn’t mean the $\beta$s are independent or not correlated. Play with the R code:

c = rnorm(10000, 0)
a = rnorm(10000, 0) + c
b = rnorm(10000, 0) + c
plot(a, b) # Correlated!
plot(a - c, b - c) # Independent!

So a hierarchical prior implies correlations.

If your prior on beta actually does make them independent, then the posterior still likely won’t be.

You can also put some sort of multivariate normal prior on things if you wanna specify the covariance directly (with a multivariate_normal). You’d do that if you put a gaussian process prior on some parameters, for instance, if you think these parameters your estimating should vary in some smooth way.

1 Like

@bbbales2 Thanks for your explanation! It is really helpfull!

My final question is if you maybe know how I can compute the posterior covariance between the random parameters (beta_0jk and beta_1jk in my example)? Is that just computing the covariance using the posterior draws of these parameters or is there another way?

That’s the way to do it. If you had a closed form of your posterior to compute the covariance from then you wouldn’t need to sample!

There was another post yesterday where someone was modeling the covariance directly as a parameter (using LKJ priors and such). The covariance in this case was a hyperparameter of some random effects, and the covariance was something of particular interest for the application. It’s pretty well written I think: Reparameterizing to avoid low E-BFMI warning - #5 by mdonoghoe .

To go beyond that your best bet would be looking through BDA3: http://www.stat.columbia.edu/~gelman/book/

1 Like

@bbbales2 Great thanks for your help.

Are there specific settings where putting LKJ priors on the random effects is more suitable than my approach?

Maybe this: https://www.youtube.com/watch?v=DzaQiJG3RpA&t=162s&list=PLuwyh42iHquU4hUBQs20hkBsKSMrp6H0J&index=8 and this: https://www.youtube.com/watch?v=DPnLb5EaCkA&list=PLuwyh42iHquU4hUBQs20hkBsKSMrp6H0J&index=8

Other than that I’d head to BDA3 or the Gelman & Hill book and look around.

1 Like

@bbbales2 I have one final question about the computation of the covariance using the posterior draws. Let’s say I have the two parameters beta_0jk and beta_1jk with each of them 15 jk combinations. So, the posterior draws of beta_0jk is a m \times 15 matrix (m is number of draws) and the same holds for beta_1jk. How can I now compute the covariance between beta_0jk and beta_1jk in just one number?

Next to that, I doubted whether I should report all the covariance terms when dealing with a lot of random variables (around ten). Do you have any opinion about that?

If beta_0jk is a length 15 vector and beta_1jk is a length 15 vector, then just concatenate matching draws of beta_0jk and beta_1jk together and then compute the covariances of this transformed parameter.

Ouch, if it was just a couple parameters, I’d do it graphically. With 10, that’s a problem. Is there any way you can talk about your uncertainty in terms of predictions? Even though posteriors are the things we’re after, it’s really hard to summarize and communicate them.

So you mean that I first have to take the mean over the draws for each element to get a vector with 15 elements if I understand it correctly?

How do you exactly mean predictions here? In the end I am indeed predicting a value but I do not know how to take the covariance terms into account in here. Next to that, I want to use it just for completeness in reporting and maybe some interpretation, or do you think that it is sufficient to only report the standard deviations?

Nono, not this. Just to be clear, assuming these things are variables in r:

cov(beta_0jk[,1], beta_1jk[,2])

is the covariance between the 1st column of beta_0jk and 2nd column of beta_1jk. I think I might be misunderstanding what you want to compute.

Covariance is usually a confusing thing to communicate because it’s the relationship between multiple things.

If you can boil down a complicated model to a simple prediction, then maybe you can make the result of your inference a simple statement about the uncertainty of a single thing, and then you don’t have to talk about covariances.

The result from stan I am talking about is the following. For example, my beta_0k is the individual specific intercept for k individuals. Therefore, I get a m \times k matrix as output from stan, with m draws for beta_0k[1] and for beta_0k[2] etc. This is the same for the beta_1k[] variable. So, how can I then get a single covariance for the two variables?

Maybe, I get just better only report the standard deviations to avoid any confusion right?

beta_0k and beta_1k are vectors of random variables. The covariance of these two things is a matrix.

If beta_0k and beta_1k are m*k matrices in r (extracted with rstan::extract), then you can compute the covariance as:

cov(beta_0k, beta_1k)

and that’ll be a k*k covariance matrix.

As long as it’s reasonable to do this (in the context of the predictions you’re making any covariance isn’t too important).

1 Like