Hm, yeah, I should have commented that more thoroughly. Sorry.
My approach to a centered version would be this:
Centered model code
data{
int<lower=1> N;
int<lower=1> N_id;
int id[N];
real ce[N];
real high[N];
real low[N];
real p[N];
}
transformed data{
real minus_log_p[N];
vector[N] difference;
for (i in 1:N){
minus_log_p[i] = -log(p[i]);
difference[i] = high[i] - low[i];
}
}
parameters{
vector[3] mus;
vector<lower=0>[3] sds;
cholesky_factor_corr[3] L_omega;
vector[3] theta[N_id];
real<lower=0> tau;
}
transformed parameters{
}
model{
vector[N] sigma = tau * difference;
vector[N] wp;
vector[N] u_low;
vector[N] u_high;
vector[N] mu;
vector[N_id] rho;
vector[N_id] alpha;
vector[N_id] beta;
sds ~ exponential(5);
L_omega ~ lkj_corr_cholesky(4);
mus[1] ~ normal(0, 0.1);
mus[2] ~ normal(-0.35, 0.1);
mus[3] ~ normal(0, 0.1);
theta ~ multi_normal_cholesky(mus, diag_pre_multiply(sds, L_omega));
for (n_id in 1:N_id){
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}
tau ~ normal( 0.2 , 0.1 );
for ( i in 1:N ) {
u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];
wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];
}
ce ~ normal(mu, sigma);
}
generated quantities{
vector[N_id] rho;
vector[N_id] alpha;
vector[N_id] beta;
for (n_id in 1:N_id){
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}
}
Running this I got some warnings:
Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-essTail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
But the results are basically identical. The centered versions has slight problems recovering the standard deviations of theta:
Centered model results
> print(posterior_hierarchical_centered, c("mus", "sds", "L_omega"))
Inference for Stan model: full_stan_model_hierarchical_centered.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mus[1] 0.02 0.00 0.01 0.00 0.01 0.02 0.03 0.04 1729 1.00
mus[2] -0.34 0.00 0.07 -0.47 -0.38 -0.34 -0.29 -0.20 1780 1.00
mus[3] 0.01 0.00 0.08 -0.15 -0.04 0.01 0.06 0.16 3019 1.00
sds[1] 0.06 0.00 0.01 0.04 0.05 0.05 0.06 0.08 788 1.00
sds[2] 0.36 0.00 0.10 0.17 0.29 0.35 0.42 0.56 448 1.01
sds[3] 0.70 0.00 0.10 0.51 0.63 0.69 0.76 0.92 1370 1.00
L_omega[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 1.00 1.00 NaN NaN
L_omega[1,2] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
L_omega[1,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
L_omega[2,1] -0.27 0.01 0.21 -0.66 -0.42 -0.27 -0.12 0.17 1146 1.00
L_omega[2,2] 0.94 0.00 0.07 0.75 0.91 0.96 0.99 1.00 1060 1.00
L_omega[2,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
L_omega[3,1] -0.52 0.00 0.15 -0.75 -0.62 -0.53 -0.43 -0.19 1337 1.00
L_omega[3,2] -0.27 0.00 0.18 -0.60 -0.40 -0.28 -0.16 0.10 1548 1.00
L_omega[3,3] 0.77 0.00 0.12 0.51 0.69 0.78 0.86 0.96 1269 1.00
Samples were drawn using NUTS(diag_e) at Wed Sep 25 17:27:48 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
While the non-centered model had slight problems recovering the correlation coefficients (this model however did not throw errors):
Non-centered model results
> print(posterior_hierarchical, c("mus", "sds", "L_omega"))
Inference for Stan model: full_stan_model_hierarchical.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mus[1] 0.02 0.00 0.01 0.00 0.01 0.02 0.03 0.04 2933 1.00
mus[2] -0.33 0.00 0.07 -0.47 -0.38 -0.33 -0.29 -0.20 5147 1.00
mus[3] 0.01 0.00 0.08 -0.15 -0.05 0.01 0.06 0.16 4983 1.00
sds[1] 0.06 0.00 0.01 0.04 0.05 0.06 0.06 0.08 2859 1.00
sds[2] 0.35 0.00 0.10 0.14 0.28 0.35 0.41 0.55 1396 1.00
sds[3] 0.70 0.00 0.10 0.52 0.63 0.70 0.77 0.92 1301 1.01
L_omega[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 1.00 1.00 NaN NaN
L_omega[1,2] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
L_omega[1,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
L_omega[2,1] -0.26 0.00 0.22 -0.67 -0.41 -0.26 -0.10 0.19 3294 1.00
L_omega[2,2] 0.94 0.00 0.07 0.74 0.91 0.96 0.99 1.00 2781 1.00
L_omega[2,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
L_omega[3,1] -0.52 0.00 0.14 -0.77 -0.62 -0.53 -0.44 -0.20 956 1.00
L_omega[3,2] -0.27 0.01 0.18 -0.60 -0.40 -0.28 -0.16 0.13 428 1.02
L_omega[3,3] 0.77 0.01 0.12 0.52 0.69 0.78 0.85 0.96 495 1.01
Samples were drawn using NUTS(diag_e) at Wed Sep 25 17:26:12 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The choice of parameterization also depends on the data that you have, so I can’t say, which model is more appropriate for your full data set – but you can try both.
Cheers!