This is a variation on Overcoming the Simpson's paradox in a multivariate linear model with within and between site gradients and repeated measurements

Let’s assume we have observed Y at multiple sites positioned along a gradient of X. Y was also observed for several groups at each site. There are several Y observations for each combination of site and group. The figure below provides an idea of what is going on (but assume we could have way more sites and groups and replicates).

I am struggling to figure out how to account for site while letting the intercept of the relation between Y and X vary by group. In fact, adding a varying intercept for site in a simple multilevel model would imply to regress Y against X at each site level, which is not what I want (I assume the gradient of interest for X is between sites, not within each site).

I tried the following “two-staged” multilevel model but it has a very hard time converging on real data, the chains stay stuck where they start for each parameter. Note the model does converge on the toy data depicted in the figure above, although it has pretty bad identifiability issues between all the alpha parameters (strong correlations in pair plots between \alpha_{\text{site}}, \alpha_{\text{group}}, and \alpha).

Which I coded in Stan this way:

```
data {
int<lower=0> N; // number of observations
int<lower=0> N_group; // number of groups
array[N] int<lower=1> group; // vector of group id
int<lower=0> N_site; // number of sites
array[N] int<lower=1> site; // vector of site id
vector[N] y; // vector of y (outcome observations)
vector[N] x; // vector of x (predictor)
}
parameters {
real alpha;
real beta;
real<lower=0> sigma_alphaS;
vector[N_site] alphaS;
vector[N_group] alphaG;
real<lower=0> sigma;
}
model {
// stage 2: model site mean along between-site gradient (x)
alpha ~ normal(0, 0.5);
beta ~ normal(0, 0.5);
sigma_alphaS ~ exponential(1);
alphaS[site] ~ normal(alpha + beta * x, sigma_alphaS);
// stage 1: model site mean, and group effect that varies within site (intercept only model)
alphaG ~ std_normal();
sigma ~ exponential(1);
y ~ normal(alphaS[site] + alphaG[group], sigma);
}
```

What is a good strategy to account for site and group here? I have already worked on prior predictive simulation to find better priors, I will push it further. The fact I can’t get this to fit real data, and the identifiability issue in the toy data make me think I could try a different parameterization of the model above; is there something I should attempt first?