I want to estimate a model with two random intercepts and a continuous response variable, y. The two random variables are not nested.
I want to allow for, and estimate the covariance between the two random variables.

ChatGPT suggested the following code where it seems like I’m able to estimate the correct variance of the two variables. However the covariance is incorrect.

data {
int<lower=0> N; // Number of observations
int<lower=1> g1[N]; // Grouping variable 1
int<lower=1> g2[N]; // Grouping variable 2
vector[N] y; // Outcome variable
int<lower=1> K; // Number of levels for groups
}
parameters {
real mu; // Population-level intercept
vector[2] alpha[K]; // Group-level intercepts for g1 and g2
real<lower=0> sigma; // Residual standard deviation
cholesky_factor_corr[2] L_Omega; // Cholesky factor of correlation matrix
vector<lower=0>[2] tau; // Standard deviations
}
model {
// Priors
mu ~ normal(0, 10);
L_Omega ~ lkj_corr_cholesky(2);
tau ~ cauchy(0, 5);
for (k in 1:K) {
alpha[k] ~ multi_normal_cholesky(rep_vector(0, 2), diag_pre_multiply(tau, L_Omega));
}
sigma ~ cauchy(0, 5);
// Likelihood
for (n in 1:N) {
y[n] ~ normal(mu + alpha[g1[n], 1] + alpha[g2[n], 2], sigma);
}
}

I am then using the following R-code to extract the estimated covariance.

fit <- sampling(model,
data = list(
N = nrow(dt),
K = length(unique(c(dt$g1, dt$g2))),
g1 = as.integer(dt$g1),
g2 = as.integer(dt$g2),
y = dt$y
),
iter = 8000,
chains = 10)
posterior_samples <- rstan::extract(fit)
L_Omega_estimates <- posterior_samples$L_Omega
tau_estimates <- posterior_samples$tau
# calculate the covariance for each sample
cov_estimates <-
L_Omega_estimates[, 2, 1] * tau_estimates[, 1] * tau_estimates[, 2]
# summarize the posterior distribution of the covariance
estimated.cov <- mean(cov_estimates)

Not sure if I’ve specified to stan model correctly. Can anyone point me in the right direction?

What do you mean when you say the covariance is incorrect?

As an aside, I’m having a little bit of trouble envisioning a modeling scenario where we have two separate factor variables with a clear 1:1 correspondence between the factor levels (so that level “A” of factor 1 clearly corresponds with level “A” of factor 2), which is what we need to meaningfully speak of the covariance. I’d be curious to hear more about the context.

I’m using the following code to simulate data and my hunch is that the estimated covariance should correspond to true one as computed below

library(data.table)
# Number of individual observations
n <- 5000
# Number of groups
k1 <- 500
k2 <- 100
# Group 1
# Generate random intercepts
dt1 <- data.table(v1 = rnorm(k1))
# Group identifier
dt1[, g1 := .I]
# Expand data set to n observations
dt1 <- dt1[rep(1:nrow(dt1), n / k1),]
# Create a noisy version of values for sorting
# Correlation between group effects will depend on the standard deviation we set here
dt1[, sort_col := v1 + rnorm(.N, sd = 3)]
# Sort data set according to noisy values
setorder(dt1, sort_col)
# Group 2
dt2 <- data.table(v2 = rnorm(k2))
dt2[, g2 := .I]
dt2 <- dt2[rep(1:nrow(dt2), n / k2),]
setorder(dt2, v2)
# Combine the groups
dt <- cbind(dt1, dt2)
# Group means are now correlated, but not perfectly
true.cov <- dt[, cov(v1, v2)]

I’m not really following your reasoning. In economics it is common to estimate such models using fixed effects. Each observation is then given a fixed effect for each group type from which you can estimate the covariance between the two variables, independent of the number of levels. The example I have in mind are so called akm-models where the outcome variable is earnings for individual i in year t. We are often interested in decomposing the variance in earnings coming from variation within the individual and variation within firms plus the covariance. In this context, individuals would be one factor variable and firms the other factor variable.

I also don’t understand the covariance structure, and unfortunately I’m not grasping where it comes from in the two-way FE sources that I’m seeing (e.g. here, here, and here). I do see a few places where the data you’re generating don’t match the Stan model that you produced with ChatGPT. Maybe clarifying those issues would help with the model.

As I understand it, you’re estimating a two-way fixed effects model/cross-classified model. Following the individual/firm wage example, maybe these are professors moving between universities. So you want to control for each professor’s mean (FE) wage and each university’s mean (FE) wages. You then want to account for the covariance between the two, e.g. do professors with higher mean wages also end up in universities that pay more on average.

ChatGPT has given you a model where the mean for professor 1 is assumed to be correlated with the mean for university 1, even if there are no observations where the two are paired. Further, since you have 500 professors and only 50 universities, alpha[51:500,2] will all be unobserved.

Here, the means for professors and universities are treated separately and so we get the two-way/cross-classified pattern you’re aiming for. But the correlation structure becomes fuzzier. Unfortunately, the only ideas I have are my own intuitions, not anything that clearly maps onto the original paper or linked sources.

Agree with all of @simonbrauer 's post. To me, the term “correlated random effects” refers to a scenario where random effects are sampled from a multivariate distribution (generally a multivariate normal) with a fitted covariance parameter, and this requires 1:1 correspondence between the levels of one of the random effects and the levels of the other. For example, these might be a correlated intercept and slope for the same grouping.

The correlation you want is instead the observed correlation between the values across the rows of the data. Like @simonbrauer, I don’t have a firm idea of how to do this. One thing I notice is this: the range of possible correlations depends on the covariates themselves. If we take a fitted model and attempt to predict the response at the locations of new data, we might find that it’s impossible to recover the same correlation (or even a similar correlation). For example, if we pass new data wherein every employee shows up at every firm in a round-robin manner, then the correlation can only be zero. Thus, your model cannot have a fitted parameter that is the correlation (or covariance), but at most can have a fitted parameter that controls some tendency for the correlation to be large/small, subject to the restrictions imposed by the specific data that are passed.

You have provided a way to simulate data, but it relies on rank statistics to perform the simulation, which will be hard or impossible to recapitulate with a Stan model. Is this the true generative process you wish to capture? If not, can you write down a simulation that reflects the generative process you really want? If we can identify the specific generative process that you’re after, then we will either be able to help you write it in Stan or advise that it will be difficult to fit with Stan (in which case it may well be difficult or impossible to fit with other tools as well–a generative process that relies on ranks scares me!).

Just a thought: maybe what you want is a prior on the random effects that has the form of a type II linear regression with no intercept, such that the slope of this regression is a parameter to be estimated?

Edit: and of course with equality constraints on the fitted values for each firm and each employee.

I could also see the individual level mean being correlated with individual level probability of being in a particular group (which might be dependent on the group means) . But it’s unclear how hard that would be to estimate and whether it matches the original two-way fixed effects model.

Thanks for insightful comments. Can you ellaborate what you mean by “relies on rank statistics to perform the simulaion” @jsocolar? I’ve tried to simulate data where I explicitly declare what the covariance should be but without success.

If it makes things clearer, this article does what I want (In particular see page 307), the estimation technique is a bit different but the estimated parameter is exactly what I’m after.

Thank you for sharing the article. I’ll give it a read.

In the meantime, I assume that this is the relevant section (age 302)? The unusual part (to @jsocolar and me, at least) is that Equation 3 identifies the covariance between an N-length vector (\alpha_i) and a J-length vector (\psi_{j(i.t)}). Is the model “copying” the entries of \alpha_i and \psi_{j} so that there is one pair for each NT observation and taking the covariance of that? That would be doable (i.e. codable) in Stan, though, again, I’m not sure it would converge to the correct answer. Along @jsocolar 's suggestion, it would not be a generative model in the sense that you could recreate the data from that particular probability model. That may not be a practical problem for your purposes, but it’s worth a note.

Your data simulation requires figuring out the rank (i.e. the sorting order) of the entries in a big random vector. There are a lot of these ranks, and an entry’s rank is discontinuous in that entry’s value (as well as in the values of the other entries), which is a nightmare for sampling. So implementing this particular data generating process in code seems really hard, especially because there is a combinatorial explosion of possible rankings.

Yes, indeed it is the covariance term in equation 3 on page 302 that i want to estimate

That seems right. In the fixed effect setting, we estimate the fixed effects and assign the fixed effects
to each observation. Each observation then has two values based on the estimated fixed effects, one value for the firm and one value for the individual. We then have two NT size vectors of fixed effects from which we can estimate the covariance.

If that’s the case, then the Stan version of that likelihood would be something like the model below. Notice how there are only K1 elements of alpha1 and K2 elements of alpha2 (I made an error in the barebones model code above, which I have since fixed). I then copy those over into an observation-wise matrix alpha12 using the selectors (arrays of integers) G1 and G2. I then treat those as multivariate-normally distributed.

As stated above, this isn’t really proper in a generative sense, and I have no sense of the potential problems it could run into. But maybe it still converges to some useful estimates. Hope that helps.