Apologies in advance for the long post.
I am working on “micro-macro” multilevel regression, in which group-level responses are predicted from individual-level covariates. It’s the same structure as regression with covariate measurement error, but here there are more covariate replications per group than there usually are in measurement error models. The model for the data is as follows:
\begin{align} Y_i &\sim \text{Normal}\left(\alpha + \beta X_i, \sigma_Y\right) (i = 1, \ldots, N )\\ W_{ij} &\sim \text{Normal} \left(X_i, \sigma_W\right) (j = 1, \ldots, n) \end{align}
where Y_i is the response for the i^\mathrm{th} group and W_{ij} is the measured covariate for the j^\mathrm{th} individual in the i^\mathrm{th} group (assume for simplicity that all groups contain n individuals). The unobserved group-level covariates X_i are given a \mathrm{Normal}\left(0, \sigma_X\right) prior.
I am currently testing this model with simulated data (generated using the attached R code) and have noticed that certain parameter combinations result in inescapably poor behaviour in Stan. Specifically, if \sigma_W/n is large relative to \sigma_Y, the posterior for the latter tends to be significantly biased upward. If \sigma_W/n is large relative to \sigma_X, the NUTS chains have very low BFMI’s and low effective sample sizes (oddly enough, this poor sampling behaviour usually manifests for \sigma_Y, even when the first problem is not present).
The severity of these problems depends on the magnitude of the true regression slope \beta: the larger it is, the more sensitive the model is to relative differences between the \sigma's. Conversely, for small enough \beta, the difficulties disappear.
Note that \sigma_W/n is the variance of the “naive covariates” \overline{W}_i, the use of which in an ordinary regression is known to result in biased parameter estimates. Therefore, it seems that the problems arise when the effect size is large and the model has to overcome a large difference between the “naive regression” and the “true regression”.
I’ve tried every trick in the book to resolve this: scaling the priors differently, centered/non-centered parameterizations, standardizing/not standardizing the data, using heavier-tailed or even improper priors, using a dense mass matrix, running for more iterations with adapt_delta close to 1. The only thing that works sometimes is the use of a zero-avoiding prior for \sigma_Y, and even this doesn’t work for many parameter configurations (and when it does, the resulting posterior for \sigma_Y is quite biased unless hyperparameters are chosen very carefully).
Does anyone have any experience with this type of difficulty? I’m starting to suspect that the difficulties here are inherent to the model itself, and it will be resistant to any possible fixes. But any help is greatly appreciated.
data {
int<lower = 1> N; // Number of groups
int<lower = 1> M; // Total number of covariate observations (sum of n_i's)
vector[N] Y; // Group-level regression responses
vector[M] W; // Individual-level covariate observations for all groups
int n[N]; // Number of covariate observations per group
}
parameters {
real alpha_raw;
real beta_raw;
real<lower = 0> sigma_Y_raw;
real<lower = 0> sigma_X_raw;
real<lower = 0> sigma_W_raw;
vector[N] X_raw;
}
transformed parameters{
real sigma_Y = 0.5*sigma_Y_raw;
real sigma_X = 2*sigma_X_raw;
real sigma_W = 2*sigma_W_raw;
vector[N] X = sigma_X * X_raw;
real alpha = 15*alpha_raw;
real beta = 15*beta_raw;
}
model{
// Likelihoods for individual-level covariates in each group
int pos = 1;
for(i in 1:N){
segment(W, pos, n[i]) ~ normal(X[i], sigma_W);
pos += n[i];
}
X_raw ~ std_normal(); // Implies X ~ normal(0, sigma_X)
Y ~ normal(alpha + beta*X, sigma_Y);
// Priors
sigma_Y_raw ~ std_normal(); // Implies sigma_Y ~ normal(0, 0.5)
sigma_X_raw ~ std_normal(); // Implies sigma_X ~ normal(0, 2)
sigma_W_raw ~ std_normal(); // Implies sigma_W ~ normal(0, 2)
beta_raw ~ normal(0, sigma_Y); // Implies beta ~ normal(0, 15*sigma_Y)
alpha_raw ~ normal(0, sigma_Y); // Implies alpha ~ normal(0, 15*sigma_Y)
}
generated quantities{
vector[N] fitted_values = alpha + beta*X;
}
multilevel_regression.R (1.0 KB)
scalar_micromacro_regression.stan (1.4 KB)