Multiple variable intercepts in a two-staged multilevel model

This is a variation on Overcoming the Simpson's paradox in a multivariate linear model with within and between site gradients and repeated measurements

Let’s assume we have observed Y at multiple sites positioned along a gradient of X. Y was also observed for several groups at each site. There are several Y observations for each combination of site and group. The figure below provides an idea of what is going on (but assume we could have way more sites and groups and replicates).

I am struggling to figure out how to account for site while letting the intercept of the relation between Y and X vary by group. In fact, adding a varying intercept for site in a simple multilevel model would imply to regress Y against X at each site level, which is not what I want (I assume the gradient of interest for X is between sites, not within each site).

I tried the following “two-staged” multilevel model but it has a very hard time converging on real data, the chains stay stuck where they start for each parameter. Note the model does converge on the toy data depicted in the figure above, although it has pretty bad identifiability issues between all the alpha parameters (strong correlations in pair plots between \alpha_{\text{site}}, \alpha_{\text{group}}, and \alpha).

\begin{aligned} y_{i} &\sim \text{Normal}(\mu_{i}, \sigma) & \text{stage 1}\\ \mu_{i} &= \alpha_{\text{site}[i]} + \alpha_{\text{group}[i]}\\ \alpha_{\text{site}[i]} &\sim \text{Normal}(\nu_{i}, \sigma_{\alpha_{site}}) & \text{stage 2}\\ \nu_i &= \alpha + \beta x_i\\ \alpha_{group}, \alpha, \beta &\sim \text{Normal}(\ldots) & \text{priors}\\ \sigma, \sigma_{\alpha_{site}} &\sim \text{Exponential}(\ldots) \end{aligned}

Which I coded in Stan this way:

data {
  int<lower=0> N;              // number of observations
  int<lower=0> N_group;        // number of groups
  array[N] int<lower=1> group; // vector of group id
  int<lower=0> N_site;         // number of sites
  array[N] int<lower=1> site;  // vector of site id
  vector[N] y;                 // vector of y (outcome observations)
  vector[N] x;                 // vector of x (predictor)
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma_alphaS;
  vector[N_site] alphaS;
  vector[N_group] alphaG;
  real<lower=0> sigma;
}

model {
  
  // stage 2: model site mean along between-site gradient (x)
  alpha ~ normal(0, 0.5);
  beta ~ normal(0, 0.5);
  sigma_alphaS ~ exponential(1);
  alphaS[site] ~ normal(alpha + beta * x, sigma_alphaS); 
  
  // stage 1: model site mean, and group effect that varies within site (intercept only model)
  alphaG ~ std_normal();
  sigma ~ exponential(1);
  y ~ normal(alphaS[site] + alphaG[group], sigma);

}

What is a good strategy to account for site and group here? I have already worked on prior predictive simulation to find better priors, I will push it further. The fact I can’t get this to fit real data, and the identifiability issue in the toy data make me think I could try a different parameterization of the model above; is there something I should attempt first?

Can you explain (in words) the model for \alpha_{\text{site}[i]}? In the scatter plot x_i varies by observation, so \nu_i = \alpha + \beta x_i is going to be different for all data points (x_i,y_i) in the same \text{site}[i]. So the 2nd stage regression is a bit unclear as there is one \alpha_{\text{site}[i]} per site and one \nu_i per observation.

PS: Have you considered fitting this model with brms as a starting point? Priors aside (and there are issues with those esp. for the intercept parameters), I think your model can be written as y ~ group + x + (1 | site).

1 Like

The idea for the model of \alpha_{\text{site}} is to predict the average y for a site with x. Because there are multiple x per site, I reuse \alpha_{\text{site}} for each x occurrence, hence why I wrote \alpha_{\text{site}[i]}. I think that’s what my Stan code is doing… but I confess this model is still confusing to me, and it might be wrong.

I haven’t tried brms, but isn’t y ~ group + x + (1 | site) a simple multilevel model allowing the intercept to vary by site? In that case it would be the model I don’t want to use because it would consider the relation between y and x at the within-site scale (I want the between-site scale).

FYI, I just tried also averaging x per site in stage1:

\begin{aligned} y_{i} &\sim \text{Normal}(\mu_{i}, \sigma) && \text{stage 1, for $i$ in 1:N observations}\\ \mu_{i} &= \alpha_{\text{site}[i]} + \alpha_{\text{group}[i]}\\ x_i &\sim \text{Normal}(\bar{x}_{\text{site}[i]}, \sigma_x)\\ \alpha_{\text{site}[j]} &\sim \text{Normal}(\nu_{j}, \sigma_{\alpha_{site}}) && \text{stage 2, for $j$ in 1:J sites}\\ \nu_j &= \alpha + \beta \bar{x}_{\text{site}[j]}\\ \alpha_{group}, \bar{x}_{\text{site}}, \alpha, \beta &\sim \text{Normal}(\ldots) && \text{priors}\\ \sigma, \sigma_x, \sigma_{\alpha_{site}} &\sim \text{Exponential}(\ldots) \end{aligned}

I am not sure if it makes more sense. On the toy example (what is on the figure in the initial post), the identifiability problem persists (strong correlation between sampled alpha parameters), and divergence issues arise.

The Stan code I used:

data {
  int<lower=0> N;              // number of observations
  int<lower=0> N_group;        // number of groups
  array[N] int<lower=1> group; // vector of group id
  int<lower=0> N_site;         // number of sites
  array[N] int<lower=1> site;  // vector of site id
  vector[N] y;                 // vector of y (outcome observations)
  vector[N] x;                 // vector of x (predictor)
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma_alphaS;
  vector[N_site] alphaS;
  vector[N_group] alphaG;
  real<lower=0> sigma;
  vector[N_site] xbar;
  real<lower=0> sigma_x;
}

model {
  
  // stage 2: model site mean along between-site gradient (x)
  alpha ~ normal(0, 0.5);
  beta ~ normal(0, 0.5);
  sigma_alphaS ~ exponential(1);
  for (s in 1:N_site) {
    alphaS[s] ~ normal(alpha + beta * xbar[s], sigma_alphaS); 
  }
  
  // stage 1: model site mean, and group effect that varies within site (intercept only model)
  // average y
  alphaG ~ std_normal();
  sigma ~ exponential(1);
  y ~ normal(alphaS[site] + alphaG[group], sigma);
  // average x
  xbar ~ std_normal();
  sigma_x ~ exponential(1);
  x ~ normal(xbar[site], sigma_x);

}

This isn’t clear because, as you note, x varies within site. So “a site with x” isn’t well-defined: each site happens to have multiple observed values for x. Do you want \operatorname{E}(Y_i) = \alpha_{\text{group}[i]} + \beta x_i or perhaps \operatorname{E}(Y_i) = \alpha_{\text{group}[i]} + \beta\bar{x}_{\text{site}[i]} where \bar{x}_{\text{site}[i]} is the average x observed at \text{site}[i]?

Update (since you posted while I was typing): It’s not surprising there is an identifiability issue; there are too many intercept parameters. Try to develop a model for one group first; in that case you can assume \alpha_{\text{group}[i]} = 0 and you’d need to estimate a single intercept, \alpha.

Well, \text{E}(y_i) = \alpha_{\text{group}[i]} + \beta x_i (or the other one) is ultimately what I am looking for. The challenge I want to face with this forum topic is to account for site and group while keeping all raw data. Two easy ways to escape the challenge would be: 1) ignore site, or 2) average the data per site and group for y and x and then do a simple linear model of \bar{y} =f(\text{group}, \bar{x}).

I am still interested to know how people would approach this situation, or if someone would spot a flaw in my approach.

I also tried the following simpler approach. It runs flawlessly, but I am wondering if it is optimal (maybe I should say “full luxury”).

\begin{aligned} x_i &\sim \text{Normal}(\bar{x}_{\text{site}[i]}, \sigma_x) && \text{submodel for $x$}\\ y_i &\sim \text{Normal}(\mu_i, \sigma) && \text{main model for $y$}\\ \mu_i &= \alpha_{\text{group}[i]} + \beta \bar{x}_{\text{site}[i]}\\ \bar{x}_{\text{site}}, \alpha_{\text{group}}, \beta &\sim \text{Normal}(0, 1) && \text{priors}\\ \sigma_x, \sigma &\sim \text{Exponential}(1)\\ \end{aligned}

Stan code:

data {
  int<lower=0> N;              // number of observations
  int<lower=0> N_group;        // number of groups
  array[N] int<lower=1> group; // vector of group id
  int<lower=0> N_site;         // number of sites
  array[N] int<lower=1> site;  // vector of site id
  vector[N] y;                 // vector of y (outcome observations)
  vector[N] x;                 // vector of x (predictor)
}

parameters {
  vector[N_site] x_bar;
  real<lower=0> sigma_x;
  vector[N_group] alphaG;
  real beta;
  real<lower=0> sigma;
}

model {
  
  // submodel for x
  x_bar ~ std_normal();
  sigma_x ~ exponential(1);
  x ~ normal(x_bar[site], sigma_x);
  
  // main model for y
  alphaG ~ std_normal();
  beta ~ std_normal();
  sigma ~ exponential(1);
  y ~ normal(alphaG[group] + beta * x_bar[site], sigma);

}
1 Like

Does the sampling run well on the real data and have you assessed the model fit eg. using posterior predictive checks? If the model fits the data well, then that’s already a quite luxurious situation.