When I fit the following linear mixed model in stan
y_{i,j} = \beta_0 + \mathbf{x}_{i,j}^T\beta + \alpha_i + \epsilon_{i,j}
where \epsilon_{i,j}\sim\mathcal{N}(0,\sigma_{\epsilon}^2),\ \beta_k\sim\mathcal{N}(0,100),\ \alpha_l\sim\mathcal{N}(0,\sigma_r^2),\ \sigma_r^2\sim\text{Half-Cauchy}(0,25) and I=4500,\ J=20
the converged posterior means are far away from true parameters (simulated data). For all parameters I observe a large enough effective sample size and a Gleman-Rubin statistic of 1.
The true parameters are \beta = [ 1. , 0.2 , -0.75]^T and \alpha=[ 0.579, -0.115, -0.125, \ldots ]^T.
These are the outputs:
mean n_eff Rhat
beta_fixed[0] 7.71 913 1.0
beta_fixed[1] -0.02 1200 1.0
beta_fixed[2] -0.05 1200 1.0
beta_random[0] -0.72 848 1.0
beta_random[1] -0.15 1189 1.0
beta_random[2] -0.83 1082 1.0
where beta_fixed represents \beta and beta_random represents \alpha. Below you can find the entire code, split up in the part where I define the data, then the stan program and finally the pystan api call
np.random.seed(1656)
Nobs = 4500
Ngroups = 20
group_counts = [int(np.round(rdm_split * 4500)) for
rdm_split in np.random.dirichlet(alpha=[1 for _ in range(Ngroups)], size=1).flatten()[:-1]]
group_counts.append(Nobs - np.sum(group_counts))
Nobs_per_group = dict([(i,group_counts[i]) for i in range(Ngroups)])
def group_hot_encoding(hot_indices,nobs):
a = np.zeros((nobs,))
a[hot_indices] = 1.0
return a
def random_effect_design_matrix(nobs_per_group):
nobs = np.sum([nobs_per_group[i] for i in nobs_per_group.keys()])
rows = []
for group_idx in range(len(nobs_per_group.keys())):
group_idx = int(group_idx)
start_idx = int(np.sum([nobs_per_group[i] for i in range(group_idx)]))
hot_indices = range(start_idx,start_idx+nobs_per_group[group_idx])
rows.append(group_hot_encoding(hot_indices,nobs))
return np.transpose(np.array(rows))
X_random_effect = random_effect_design_matrix(Nobs_per_group)
X_fixed_effect = sm.add_constant(np.column_stack((uniform.rvs(size=Nobs),uniform.rvs(size=Nobs))))
beta_fixed = np.array([1,0.2,-0.75])
beta_random = np.array([0.579, -0.115, -0.125, 0.169, -0.5, -1.429, -1.171, -0.205, 0.193, 0.041, -0.917, -0.353, -1.197,
1.044, 1.084, -0.085, -0.886, -0.352, -1.398, 0.35])
model_data = {}
model_data['N'] = len(Y)
model_data['D_fixed'] = X_fixed_effect.shape[1]
model_data['D_random'] = X_random_effect.shape[1]
model_data['Y'] = Y
model_data['X_fixed'] = X_fixed_effect
model_data['X_random'] = X_random_effect
model_data['prior_mean_beta_fixed'] = np.zeros_like(beta_fixed)
model_data['prior_cov_beta_fixed'] = 100 * np.eye(len(beta_fixed))
model_data['prior_mean_beta_random'] = np.zeros_like(beta_random)
model_data['prior_cov_beta_random'] = np.eye(len(beta_random))
stan_code = """
data {
int<lower=0> N;
int<lower=0> D_fixed;
int<lower=0> D_random;
matrix[N,D_fixed] X_fixed;
matrix[N,D_random] X_random;
vector[N] Y;
vector[D_fixed] prior_mean_beta_fixed;
matrix[D_fixed,D_fixed] prior_cov_beta_fixed;
vector[D_random] prior_mean_beta_random;
matrix[D_random,D_random] prior_cov_beta_random;
}
parameters {
vector[D_fixed] beta_fixed;
vector[D_random] beta_random;
real<lower=0> sigma_eps;
real<lower=0> sigma_random;
}
transformed parameters {
}
model {
sigma_random ~ cauchy(0,25);
sigma_eps ~ cauchy(0,25);
beta_fixed ~ multi_normal(prior_mean_beta_fixed, prior_cov_beta_fixed);
beta_random ~ multi_normal(prior_mean_beta_random, sigma_random * prior_cov_beta_random);
Y ~ normal(X_fixed * beta_fixed + X_random * beta_random, sigma_eps);
}
"""
fit = pystan.stan(model_code=stan_code, data=model_data, iter=10000, chains=3, thin=10, warmup=6000, n_jobs=3)
nlines = 30
output = str(fit).split('\n')
for line in output:
print(line)
I re-ran with 20k iterations per chain and used the model diagnostics that Michael Betancourt kindly provides on his github and all looks good
> stan_utility.check_all_diagnostics(fit)
n_eff / iter looks reasonable for all parameters
Rhat looks reasonable for all parameters
0.0 of 4200 iterations ended with a divergence (0.0%)
0 of 4200 iterations saturated the maximum tree depth of 10 (0.0%)
E-BFMI indicated no pathological behavior
Thanks for the help