Correlated varying intercepts -- can I use gr(cov = ?)?

Hi everyone, I’m looking for some advice.

I am a political scientist and am doing some research on democracy. I have 11 different dichotomous measures of democracy, each of which takes the value 0 if some country in some year was non-democratic and 1 if it was. I’d like to fit a 2 parameter IRT model to these data. The problem, however, is that the measures are not independent, since some draw on others for inspiration/to corroborate unfamiliar cases. As such, there is a risk of double-counting cases.

To my mind, this is akin to the problem that phylogenetic regression tries to solve, except here the relationship between my measures is more ambiguous (and it’s not like I can test their DNA). Nevertheless, can I simply compute the measures’ covariance matrix and pass this to the cov = argument in the gr() function to account for the relationship between the measures? Or is there something more sophisticated that I need to do to account for this?

For what it’s worth, the proposed approach of trying to find a maximum likelihood covariance matrix (which would need to be on the logit scale, but how are you going to find the maximum-likelihood covariance matrix for the logits?), and then treating that as fixed in a separate estimation step, doesn’t feel very comfortable to me.

On the other hand, this feels like precisely the use case for multivariate probit regression, which leads to a well identified covariance matrix that you can estimate from data. See 1.15 Multivariate outcomes | Stan User’s Guide But unfortunately, this is not something that to my knowledge exists in brms (see Seemingly unrelated / Multivariate Probit Regression · Issue #1366 · paul-buerkner/brms · GitHub)

1 Like

This is a good point. I hadn’t considered that the covariance matrix would need to be on the logit scale too. For what it’s worth, I was adapting this from the brms guide on fitting phylogenetic models, but I perhaps it doesn’t work without a continuous outcome.

Thanks for the guidance on multivariate probit too. I’ll take a look.

Phylogenetic models are fine to use when the outcome is binary; the key difference is that the covariance matrix is known, not fitted.

I might be being dim, and I’m pretty far out of my wheelhouse here, so any advice would be appreciated. To what extent is the covariance matrix of species based on genetic data known in a way that the covariance matrix of items based on response data is not?

Phylogenetic models are models for values of (or correlations between) traits that leverage the fact that closely related species tend to have similar trait values. These days, the relatedness of species is often known a priori based on a well calibrated molecular phylogeny. One way to represent that “relatedness” graphically is to plot the species on a phylogenetic tree (with metric branch lengths); the total branch length between them (i.e. twice the branch length between one and the most recent common ancestor) represents the total amount of time that those species have been evolving independently. To use a phylogenetic model sensu brms, you take the matrix where rows and columns are species, and entires are the total branch lengths between them. If you trait is continuous and you assume a brownian motion model of trait evolution through time, then you get a very straightforward expression for the covariance matrix for the expectation of the (standardized) trait value across species. So true covariance matrix is known a priori without reference to the trait values you are modeling.