Posterior mean far away from true parameter in glmm

MaxHoefl · July 9, 2018, 4:16pm

When I fit the following linear mixed model in stan

y_{i,j} = \beta_0 + \mathbf{x}_{i,j}^T\beta + \alpha_i + \epsilon_{i,j}

where \epsilon_{i,j}\sim\mathcal{N}(0,\sigma_{\epsilon}^2),\ \beta_k\sim\mathcal{N}(0,100),\ \alpha_l\sim\mathcal{N}(0,\sigma_r^2),\ \sigma_r^2\sim\text{Half-Cauchy}(0,25) and I=4500,\ J=20

the converged posterior means are far away from true parameters (simulated data). For all parameters I observe a large enough effective sample size and a Gleman-Rubin statistic of 1.

The true parameters are \beta = [ 1. , 0.2 , -0.75]^T and \alpha=[ 0.579, -0.115, -0.125, \ldots ]^T.

These are the outputs:
mean n_eff Rhat
beta_fixed[0] 7.71 913 1.0
beta_fixed[1] -0.02 1200 1.0
beta_fixed[2] -0.05 1200 1.0
beta_random[0] -0.72 848 1.0
beta_random[1] -0.15 1189 1.0
beta_random[2] -0.83 1082 1.0

where beta_fixed represents \beta and beta_random represents \alpha. Below you can find the entire code, split up in the part where I define the data, then the stan program and finally the pystan api call

np.random.seed(1656)

Nobs = 4500
Ngroups = 20
group_counts = [int(np.round(rdm_split * 4500)) for 
                rdm_split in np.random.dirichlet(alpha=[1 for _ in range(Ngroups)], size=1).flatten()[:-1]]
group_counts.append(Nobs - np.sum(group_counts))
Nobs_per_group = dict([(i,group_counts[i]) for i in range(Ngroups)])

def group_hot_encoding(hot_indices,nobs):
    a = np.zeros((nobs,))
    a[hot_indices] = 1.0
    return a

def random_effect_design_matrix(nobs_per_group):
    nobs = np.sum([nobs_per_group[i] for i in nobs_per_group.keys()])
    rows = []
    for group_idx in range(len(nobs_per_group.keys())):
        group_idx = int(group_idx)
        start_idx = int(np.sum([nobs_per_group[i] for i in range(group_idx)]))
        hot_indices = range(start_idx,start_idx+nobs_per_group[group_idx])
        rows.append(group_hot_encoding(hot_indices,nobs))
    return np.transpose(np.array(rows))

X_random_effect = random_effect_design_matrix(Nobs_per_group)
X_fixed_effect = sm.add_constant(np.column_stack((uniform.rvs(size=Nobs),uniform.rvs(size=Nobs))))

beta_fixed = np.array([1,0.2,-0.75])
beta_random = np.array([0.579, -0.115, -0.125, 0.169, -0.5, -1.429, -1.171, -0.205, 0.193, 0.041, -0.917, -0.353, -1.197, 
             1.044, 1.084, -0.085, -0.886, -0.352, -1.398, 0.35])

model_data = {}
model_data['N'] = len(Y)
model_data['D_fixed'] = X_fixed_effect.shape[1]
model_data['D_random'] = X_random_effect.shape[1]
model_data['Y'] = Y
model_data['X_fixed'] = X_fixed_effect
model_data['X_random'] = X_random_effect
model_data['prior_mean_beta_fixed'] = np.zeros_like(beta_fixed)
model_data['prior_cov_beta_fixed'] = 100 * np.eye(len(beta_fixed))
model_data['prior_mean_beta_random'] = np.zeros_like(beta_random)
model_data['prior_cov_beta_random'] = np.eye(len(beta_random))

stan_code = """
data {
    int<lower=0> N;
    int<lower=0> D_fixed;
    int<lower=0> D_random;
    matrix[N,D_fixed]  X_fixed;
    matrix[N,D_random] X_random;
    vector[N] Y;
    
    vector[D_fixed]           prior_mean_beta_fixed;
    matrix[D_fixed,D_fixed]   prior_cov_beta_fixed;
    
    vector[D_random]            prior_mean_beta_random;
    matrix[D_random,D_random]   prior_cov_beta_random;
}
parameters {
    vector[D_fixed] beta_fixed;
    vector[D_random] beta_random;
    real<lower=0> sigma_eps;
    real<lower=0> sigma_random;
}
transformed parameters {
    
}
model {
    sigma_random ~ cauchy(0,25);
    sigma_eps ~ cauchy(0,25);
    beta_fixed  ~ multi_normal(prior_mean_beta_fixed, prior_cov_beta_fixed);
    beta_random ~ multi_normal(prior_mean_beta_random, sigma_random * prior_cov_beta_random);
    
    Y ~ normal(X_fixed * beta_fixed + X_random * beta_random, sigma_eps);
}
"""

fit = pystan.stan(model_code=stan_code, data=model_data, iter=10000, chains=3, thin=10, warmup=6000, n_jobs=3)
nlines = 30
output = str(fit).split('\n')
for line in output:
    print(line)

I re-ran with 20k iterations per chain and used the model diagnostics that Michael Betancourt kindly provides on his github and all looks good

> stan_utility.check_all_diagnostics(fit)

n_eff / iter looks reasonable for all parameters
Rhat looks reasonable for all parameters
0.0 of 4200 iterations ended with a divergence (0.0%)
0 of 4200 iterations saturated the maximum tree depth of 10 (0.0%)
E-BFMI indicated no pathological behavior

Thanks for the help

betanalpha · July 15, 2018, 3:56pm

There is no guarantee that the posterior mean will be close to the true values. Even in ideal circumstances the best we can typically hope for is that the true value will be contained within the posterior, meaning that a more appropriate test would be wether or not the true values is within, say, the 5% and 95% posterior quantiles.

That said, even in those ideal circumstances a specific posterior interval like this won’t always contain the true value – ultimately you have to look at as many simulated observations as possible. The best is to sample truths from your priors and then observations from those truths.

Topic		Replies	Views
Request for help understanding model behavior Modeling	17	1572	March 24, 2018
My first Stan model - hierarchical logistic regression Modeling cognitive-science	28	8418	June 6, 2017
Bayesian standard error for an estimate of a mean Modeling	6	5606	January 30, 2018
Difference between glmer and stan_glmer RStan	8	1968	October 9, 2017
The existence of posterior mean Modeling	6	980	October 19, 2019

Posterior mean far away from true parameter in glmm

Related topics