Posterior mean far away from true parameter in glmm

When I fit the following linear mixed model in stan

y_{i,j} = \beta_0 + \mathbf{x}_{i,j}^T\beta + \alpha_i + \epsilon_{i,j}

where \epsilon_{i,j}\sim\mathcal{N}(0,\sigma_{\epsilon}^2),\ \beta_k\sim\mathcal{N}(0,100),\ \alpha_l\sim\mathcal{N}(0,\sigma_r^2),\ \sigma_r^2\sim\text{Half-Cauchy}(0,25) and I=4500,\ J=20

the converged posterior means are far away from true parameters (simulated data). For all parameters I observe a large enough effective sample size and a Gleman-Rubin statistic of 1.

The true parameters are \beta = [ 1. , 0.2 , -0.75]^T and \alpha=[ 0.579, -0.115, -0.125, \ldots ]^T.

These are the outputs:
mean n_eff Rhat
beta_fixed[0] 7.71 913 1.0
beta_fixed[1] -0.02 1200 1.0
beta_fixed[2] -0.05 1200 1.0
beta_random[0] -0.72 848 1.0
beta_random[1] -0.15 1189 1.0
beta_random[2] -0.83 1082 1.0

where beta_fixed represents \beta and beta_random represents \alpha. Below you can find the entire code, split up in the part where I define the data, then the stan program and finally the pystan api call


np.random.seed(1656)

Nobs = 4500
Ngroups = 20
group_counts = [int(np.round(rdm_split * 4500)) for 
                rdm_split in np.random.dirichlet(alpha=[1 for _ in range(Ngroups)], size=1).flatten()[:-1]]
group_counts.append(Nobs - np.sum(group_counts))
Nobs_per_group = dict([(i,group_counts[i]) for i in range(Ngroups)])

def group_hot_encoding(hot_indices,nobs):
    a = np.zeros((nobs,))
    a[hot_indices] = 1.0
    return a

def random_effect_design_matrix(nobs_per_group):
    nobs = np.sum([nobs_per_group[i] for i in nobs_per_group.keys()])
    rows = []
    for group_idx in range(len(nobs_per_group.keys())):
        group_idx = int(group_idx)
        start_idx = int(np.sum([nobs_per_group[i] for i in range(group_idx)]))
        hot_indices = range(start_idx,start_idx+nobs_per_group[group_idx])
        rows.append(group_hot_encoding(hot_indices,nobs))
    return np.transpose(np.array(rows))

X_random_effect = random_effect_design_matrix(Nobs_per_group)
X_fixed_effect = sm.add_constant(np.column_stack((uniform.rvs(size=Nobs),uniform.rvs(size=Nobs))))

beta_fixed = np.array([1,0.2,-0.75])
beta_random = np.array([0.579, -0.115, -0.125, 0.169, -0.5, -1.429, -1.171, -0.205, 0.193, 0.041, -0.917, -0.353, -1.197, 
             1.044, 1.084, -0.085, -0.886, -0.352, -1.398, 0.35])

model_data = {}
model_data['N'] = len(Y)
model_data['D_fixed'] = X_fixed_effect.shape[1]
model_data['D_random'] = X_random_effect.shape[1]
model_data['Y'] = Y
model_data['X_fixed'] = X_fixed_effect
model_data['X_random'] = X_random_effect
model_data['prior_mean_beta_fixed'] = np.zeros_like(beta_fixed)
model_data['prior_cov_beta_fixed'] = 100 * np.eye(len(beta_fixed))
model_data['prior_mean_beta_random'] = np.zeros_like(beta_random)
model_data['prior_cov_beta_random'] = np.eye(len(beta_random))
stan_code = """
data {
    int<lower=0> N;
    int<lower=0> D_fixed;
    int<lower=0> D_random;
    matrix[N,D_fixed]  X_fixed;
    matrix[N,D_random] X_random;
    vector[N] Y;
    
    vector[D_fixed]           prior_mean_beta_fixed;
    matrix[D_fixed,D_fixed]   prior_cov_beta_fixed;
    
    vector[D_random]            prior_mean_beta_random;
    matrix[D_random,D_random]   prior_cov_beta_random;
}
parameters {
    vector[D_fixed] beta_fixed;
    vector[D_random] beta_random;
    real<lower=0> sigma_eps;
    real<lower=0> sigma_random;
}
transformed parameters {
    
}
model {
    sigma_random ~ cauchy(0,25);
    sigma_eps ~ cauchy(0,25);
    beta_fixed  ~ multi_normal(prior_mean_beta_fixed, prior_cov_beta_fixed);
    beta_random ~ multi_normal(prior_mean_beta_random, sigma_random * prior_cov_beta_random);
    
    Y ~ normal(X_fixed * beta_fixed + X_random * beta_random, sigma_eps);
}
"""
fit = pystan.stan(model_code=stan_code, data=model_data, iter=10000, chains=3, thin=10, warmup=6000, n_jobs=3)
nlines = 30
output = str(fit).split('\n')
for line in output:
    print(line)

I re-ran with 20k iterations per chain and used the model diagnostics that Michael Betancourt kindly provides on his github and all looks good

> stan_utility.check_all_diagnostics(fit)

n_eff / iter looks reasonable for all parameters
Rhat looks reasonable for all parameters
0.0 of 4200 iterations ended with a divergence (0.0%)
0 of 4200 iterations saturated the maximum tree depth of 10 (0.0%)
E-BFMI indicated no pathological behavior

Thanks for the help

There is no guarantee that the posterior mean will be close to the true values. Even in ideal circumstances the best we can typically hope for is that the true value will be contained within the posterior, meaning that a more appropriate test would be wether or not the true values is within, say, the 5% and 95% posterior quantiles.

That said, even in those ideal circumstances a specific posterior interval like this won’t always contain the true value – ultimately you have to look at as many simulated observations as possible. The best is to sample truths from your priors and then observations from those truths.