# Centered variance-covariance matrix for hierarchical model

Still not there I am afraid. I have been playing around with the starting values and some changes in the way the model is written, but to no avail. I now get the following error:

Error evaluating the log probability at the initial value.
Chain 4: Exception: validate transformed params: b_id[1] is nan, but must be greater than or equal to -0.8

(the -0.8 comes from a constraint on b being positive)

Oddly, the error seems to be caused by whatever hierarchical parameter is in the second column of v_N_id, i.e. if I switch the two around it is the other parameter giving me the error. It also seems strange that the same model works well without the var-covar matrix, and even with the non-centered var-cov matrix it works’ (except for convergence being terrible). I thus suspect that I still have a problem in the code, but I just cannot figure out where.

Can you post your data (or a subsample) here so I can have a look? Also, the only constraint is b>0? Or also y>0?

The one I posted was supposed to be a minimal example, the actual one is a little more complex. There are 3 hierarchical parameters, two of which are constraint to be non-negative. I attach the full model and a minimal dataset I am using to test the model.

data_example.csv (14.4 KB) full_stan_model.stan (1.5 KB)

I think to ensure \alpha_{id[i]} > 0 and \beta_{id[i]} > 0 it is better to model \texttt{log_alpha}_{id[i]} directly.

Its a bit hard for me to follow all de details here, but parts of the models are super prone to blow up / collapse… for example (lines 43 +44) stuff like \exp(-\rho_{id[i]}high), where \rho is not restricted to be positive and high>20 seems strange. Should that be \log(high)?

Also, line 45, \exp(-\beta_{id[i]}(-\log p^{\alpha_{id[i]}})), seems awkward.

I tried some stuff here, but, basically at wp everything breaks down… (but you can see here, how I packed packed together rho, log_alpha, and log_beta to a new vector theta… This coding would be a non-centered parameterization though, since I usually try those first, but I could even get the model to run, because of the issues with the computations [see above]):

data{
int<lower=1> N;
int<lower=1> N_id;
int id[N];
vector[N] ce;
vector[N] high;
vector[N] low;
vector[N] p;
}
parameters{
vector[3] mus;
real<lower=0> tau;
vector<lower=0>[3] sigma_sub;
cholesky_factor_corr[3] L_Rho_sub;
vector[3] Z[N_id];
}
transformed parameters{
vector[3] theta[N_id];
vector[N_id] rho;
vector<lower=0>[N_id] alpha;
vector<lower=0>[N_id] beta;
vector<lower=0>[N] sigma = tau * (high - low);
vector[N] u_high;
vector[N] u_low;
vector[N] wp;
vector[N] mu;

for (n_id in 1:N_id){
theta[n_id] = mus + diag_pre_multiply(sigma_sub, L_Rho_sub) * Z[n_id];
rho[n_id] = theta[n_id, 1];
alpha[n_id] = exp(theta[n_id, 2]);
beta[n_id] = exp(theta[n_id, 3]);
}

u_high = (1 - exp(-(rho[id]) .* high)) ./ rho[id];
u_low = (1 - exp(-(rho[id]) .* low)) ./ rho[id];

for (i in 1:N){
wp[i] = exp(-beta[id[i]] * (-log(p[i])^(alpha[id[i]])));
mu[i] = (log(-(((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * (rho[id[i]])) - 1))) / (-(rho[id[i]]));
}

print("u_high:", u_high);
print("u_low:", u_low);
print("wp:", wp);
print("mu:", mu);

}
model{
mus[1] ~ normal( 0 , 0.1 ); // prior mu for rho
mus[2] ~ normal( -0.35 , 0.1 ); // prior mu for log(alpha)
mus[3] ~ normal( 0, 0.1); // prior mu for log(beta)

tau ~ normal( 0.2 , 0.1 );

sigma_sub ~ std_normal( );
L_Rho_sub ~ lkj_corr_cholesky( 3 );
for (n_id in 1:N_id)
Z[n_id] ~ std_normal();

ce ~ normal( mu , sigma );
}



Is this some model from the literature that you can link to? Some paper describing u_high, u_low, and wp?

Hi Max, I appreciate all your efforts on this. Yes, it is a fairly standard model of decision making under risk, and I have used these functional forms many times. See the following link for the “awkward” function:

The issue that bothers me is that my non-centered version of the model (the first minimal example I posted above) actually runs fine even with the non-negativity constraints, except that it struggles to converge to something meaningful. Notice that to simplify, one could also define u_high = high^rho[id], and redefine high_std = high/40 and low_std = low/40. Again, I do not think the functional forms by themselves are the issue, since they work fine without the var-covar matrix.

Yes, it is a fairly standard model of decision making under risk, and I have used these functional forms many times. See the following link for the “awkward” function

Sorry I didn’t mean to sound snarky. It’s just that those kind of thing run into numerical instabilities super fast, and with hierarchical models you generally need some flexibility…

Anyways, running the following with iter = 2000 and init_r = 0.5 actually worked:

data{
int<lower=1> N;
int<lower=1> N_id;
int id[N];
real ce[N];
real high[N];
real low[N];
real p[N];
}
transformed data{
real minus_log_p[N];

for (i in 1:N)
minus_log_p[i] = -log(p[i]);
}
parameters{
vector[3] mus;
vector<lower=0>[3] sds;
cholesky_factor_corr[3] L_omega;
vector[3] Z[N_id];

real<lower=0> tau;

}
transformed parameters{
vector[3] theta[N_id];
vector[N_id] rho;
vector<lower=0>[N_id] alpha;
vector<lower=0>[N_id] beta;

for (n_id in 1:N_id){
theta[n_id] = mus + diag_pre_multiply(sds, L_omega) * Z[n_id];
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}

}
model{
vector[N] sigma;
vector[N] wp;
vector[N] u_low;
vector[N] u_high;
vector[N] mu;

sds ~ exponential(5);
L_omega ~ lkj_corr_cholesky(4);
mus[1] ~ normal(0, 0.1);
mus[2] ~ normal(-0.35, 0.1);
mus[3] ~ normal(0, 0.1);

for (n_id in 1:N_id)
Z[n_id] ~ std_normal();

tau ~ normal( 0.2 , 0.1 );

for ( i in 1:N ) {

u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];

wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);

sigma[i] = tau * (high[i] - low[i]);

mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];

}
ce ~ normal( mu , sigma );
}


1 Like

@Ferdinand_Vieider die this work for you as well? Otherwise I can also post the centered version of the code above, if you need it.

Cheers!

Hi Max, sorry, I got somewhat side-tracked.

The model works very well in the version you posted–thanks a lot! I had some difficulties interpreting it at first, but this was just due to the re-definition of some parameters in your model that I missed initially. I got it now (hence the edit of my original reply).

Nevertheless, if you also have a centered version it would be great if you could post that. I would be very keen on seeing how that works, and on running some comparisons between the two models to see how their performance compares on my data.

Cheers,
Ferdinand

Hm, yeah, I should have commented that more thoroughly. Sorry.

My approach to a centered version would be this:

### Centered model code

data{
int<lower=1> N;
int<lower=1> N_id;
int id[N];
real ce[N];
real high[N];
real low[N];
real p[N];
}

transformed data{
real minus_log_p[N];
vector[N] difference;

for (i in 1:N){
minus_log_p[i] = -log(p[i]);
difference[i] = high[i] - low[i];
}
}

parameters{
vector[3] mus;
vector<lower=0>[3] sds;
cholesky_factor_corr[3] L_omega;
vector[3] theta[N_id];

real<lower=0> tau;

}

transformed parameters{

}

model{
vector[N] sigma = tau * difference;
vector[N] wp;
vector[N] u_low;
vector[N] u_high;
vector[N] mu;
vector[N_id] rho;
vector[N_id] alpha;
vector[N_id] beta;

sds ~ exponential(5);
L_omega ~ lkj_corr_cholesky(4);
mus[1] ~ normal(0, 0.1);
mus[2] ~ normal(-0.35, 0.1);
mus[3] ~ normal(0, 0.1);

theta ~ multi_normal_cholesky(mus, diag_pre_multiply(sds, L_omega));

for (n_id in 1:N_id){
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}

tau ~ normal( 0.2 , 0.1 );

for ( i in 1:N ) {

u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];

wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];

}

ce ~ normal(mu, sigma);

}
generated quantities{
vector[N_id] rho;
vector[N_id] alpha;
vector[N_id] beta;

for (n_id in 1:N_id){
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}
}


Running this I got some warnings:

Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-essTail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess


But the results are basically identical. The centered versions has slight problems recovering the standard deviations of theta:

### Centered model results

> print(posterior_hierarchical_centered, c("mus", "sds", "L_omega"))
Inference for Stan model: full_stan_model_hierarchical_centered.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mus[1]        0.02    0.00 0.01  0.00  0.01  0.02  0.03  0.04  1729 1.00
mus[2]       -0.34    0.00 0.07 -0.47 -0.38 -0.34 -0.29 -0.20  1780 1.00
mus[3]        0.01    0.00 0.08 -0.15 -0.04  0.01  0.06  0.16  3019 1.00
sds[1]        0.06    0.00 0.01  0.04  0.05  0.05  0.06  0.08   788 1.00
sds[2]        0.36    0.00 0.10  0.17  0.29  0.35  0.42  0.56   448 1.01
sds[3]        0.70    0.00 0.10  0.51  0.63  0.69  0.76  0.92  1370 1.00
L_omega[1,1]  1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_omega[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[1,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[2,1] -0.27    0.01 0.21 -0.66 -0.42 -0.27 -0.12  0.17  1146 1.00
L_omega[2,2]  0.94    0.00 0.07  0.75  0.91  0.96  0.99  1.00  1060 1.00
L_omega[2,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[3,1] -0.52    0.00 0.15 -0.75 -0.62 -0.53 -0.43 -0.19  1337 1.00
L_omega[3,2] -0.27    0.00 0.18 -0.60 -0.40 -0.28 -0.16  0.10  1548 1.00
L_omega[3,3]  0.77    0.00 0.12  0.51  0.69  0.78  0.86  0.96  1269 1.00

Samples were drawn using NUTS(diag_e) at Wed Sep 25 17:27:48 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


While the non-centered model had slight problems recovering the correlation coefficients (this model however did not throw errors):

### Non-centered model results

> print(posterior_hierarchical, c("mus", "sds", "L_omega"))
Inference for Stan model: full_stan_model_hierarchical.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mus[1]        0.02    0.00 0.01  0.00  0.01  0.02  0.03  0.04  2933 1.00
mus[2]       -0.33    0.00 0.07 -0.47 -0.38 -0.33 -0.29 -0.20  5147 1.00
mus[3]        0.01    0.00 0.08 -0.15 -0.05  0.01  0.06  0.16  4983 1.00
sds[1]        0.06    0.00 0.01  0.04  0.05  0.06  0.06  0.08  2859 1.00
sds[2]        0.35    0.00 0.10  0.14  0.28  0.35  0.41  0.55  1396 1.00
sds[3]        0.70    0.00 0.10  0.52  0.63  0.70  0.77  0.92  1301 1.01
L_omega[1,1]  1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_omega[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[1,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[2,1] -0.26    0.00 0.22 -0.67 -0.41 -0.26 -0.10  0.19  3294 1.00
L_omega[2,2]  0.94    0.00 0.07  0.74  0.91  0.96  0.99  1.00  2781 1.00
L_omega[2,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[3,1] -0.52    0.00 0.14 -0.77 -0.62 -0.53 -0.44 -0.20   956 1.00
L_omega[3,2] -0.27    0.01 0.18 -0.60 -0.40 -0.28 -0.16  0.13   428 1.02
L_omega[3,3]  0.77    0.01 0.12  0.52  0.69  0.78  0.85  0.96   495 1.01

Samples were drawn using NUTS(diag_e) at Wed Sep 25 17:26:12 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


The choice of parameterization also depends on the data that you have, so I can’t say, which model is more appropriate for your full data set – but you can try both.

Cheers!

This is great, thanks! I will make sure to try out both.

Hi Max,

I would like to ask a follow-up question to this post if I may. Let me know if I should rather open a new question for it and I will do so.

I have done quite a bit of testing using the non-centered model you proposed above, and it works very well. However, now I would like to add an additional level to the hierarchy. I thought something like this ought to do the trick:

for (n_id in 1:N_id){
theta[n_id] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n_id]] + diag_pre_multiply(sds, L_omega) * Z[n_id];
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}


for (n_g in 1:N_group)
Z_g[n_g] ~ std_normal();


and the variables defined much as before. It should then be possible to recover the group-level parameters in the generated quantities block.

The issue is, this model does not seem to work. While it still converges quite happily and all looks good at first, it turns out that the model only fills in the first 6-7 entries in Z_g, after which all the values are virtually the same. The lower level entries in Z and the parameters who, alpha, beta still look fine, but somehow the model seems to ignore’ the group level. Is there something obvious I am missing here? Here comes a complete example:

data{
int<lower=1> N;
int<lower=1> N_id;
int<lower=1> N_group;
int id[N];
int group[N];
real ce[N];
real high[N];
real low[N];
real p[N];
}
transformed data{
real minus_log_p[N];
for (i in 1:N)
minus_log_p[i] = -log(p[i]);
}
parameters{
vector[3] mus;
vector<lower=0>[3] sds;
vector<lower=0>[3] sds_g;
cholesky_factor_corr[3] L_omega;
cholesky_factor_corr[3] L_omega_g;
vector[3] Z[N_id];
vector[3] Z_g[N_group];
real<lower=0> tau;
}
transformed parameters{
vector[3] theta[N_id];
vector[N_id] rho;
vector<lower=0>[N_id] alpha;
vector<lower=0>[N_id] beta;
for (n_id in 1:N_id){
theta[n_id] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n_id]] + diag_pre_multiply(sds, L_omega) * Z[n_id];
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}
}
model{
vector[N] sigma;
vector[N] wp;
vector[N] u_low;
vector[N] u_high;
vector[N] mu;
sds ~ exponential(5);
sds_g ~ exponential(5);
L_omega ~ lkj_corr_cholesky(4);
L_omega_g ~ lkj_corr_cholesky(4);
mus[1] ~ normal(0, 0.1);
mus[2] ~ normal(-0.35, 0.1);
mus[3] ~ normal(0, 0.1);
for (n_id in 1:N_id)
Z[n_id] ~ std_normal();
for (n_g in 1:N_group)
Z_g[n_g] ~ std_normal();
tau ~ normal( 0.2 , 0.1 );
for ( i in 1:N ) {
u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];
wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
sigma[i] = tau * (high[i] - low[i]);
mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];
}
ce ~ normal( mu , sigma );
}
generated quantities{
vector[3] theta_g[N_group];
vector[N_group] rho_g;
vector[N_group] alpha_g;
vector[N_group] beta_g;
for (n_g in 1:N_group){
theta_g[n_g] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[n_g];
rho_g[n_g] = theta_g[n_g,1];
alpha_g[n_g] = exp(theta_g[n_g,2]);
beta_g[n_g] = exp(theta_g[n_g,3]);
}
}


Any ideas what I am doing wrong here?

Cheers,
Ferdinand

I think this is a simple indexing error.

This should work:

data{
int<lower=1> N;
int<lower=1> N_id;
int<lower=1> N_group;
int id[N];
int group[N];
real ce[N];
real high[N];
real low[N];
real p[N];
}
transformed data{
real minus_log_p[N];

for (i in 1:N)
minus_log_p[i] = -log(p[i]);
}
parameters{
vector[3] mus;
vector<lower=0>[3] sds;
vector<lower=0>[3] sds_g;

cholesky_factor_corr[3] L_omega;
cholesky_factor_corr[3] L_omega_g;

vector[3] Z[N_id];
vector[3] Z_g[N_group];

real<lower=0> tau;

}
transformed parameters{
vector[3] theta[N];
vector[N] rho;
vector<lower=0>[N] alpha;
vector<lower=0>[N] beta;

for (n in 1:N){
theta[n] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n]] + diag_pre_multiply(sds, L_omega) * Z[id[n]];;
rho[n] = theta[n,1];
alpha[n] = exp(theta[n,2]);
beta[n] = exp(theta[n,3]);
}

}
model{
vector[N] sigma;
vector[N] wp;
vector[N] u_low;
vector[N] u_high;
vector[N] mu;

sds ~ exponential(5);
sds_g ~ exponential(5);
L_omega ~ lkj_corr_cholesky(4);
L_omega_g ~ lkj_corr_cholesky(4);
mus[1] ~ normal(0, 0.1);
mus[2] ~ normal(-0.35, 0.1);
mus[3] ~ normal(0, 0.1);

for (n_id in 1:N_id)
Z[n_id] ~ std_normal();

for (n_g in 1:N_group)
Z_g[n_g] ~ std_normal();

for ( i in 1:N ) {

u_high[i] = (1 - exp(-rho[i] * high[i])) / rho[i];
u_low[i] = (1 - exp(-rho[i] * low[i])) / rho[i];

wp[i] = exp( -beta[i] * minus_log_p[i]^alpha[i]);

sigma[i] = tau * (high[i] - low[i]);

mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[i]) / -rho[i];
}

ce ~ normal( mu , sigma );
}



Let me know if it does(n’t) work!

Cheers! :)
Max

Hi Max,

thank you for the suggestion, but this does not seem to be it. In the model you proposed on September 23–and on which I based my modification above–the theta vector is defined at the level of the id identifier, i.e. theta[n_id]. Defining it at the level of the observation n means that the parameters are no longer identified, since one single datapoint is not sufficient to identify a parameter.

I attach a minimal dataset with the group identifier, so that you can see the structure of the data: example_data_group.csv (81.4 KB) . I also attach the output I obtain using these data for Z_g from running the model I posted above: output_Z_g.csv (3.0 KB).

The first three parameter vectors/rows of Z_g would appear to be on the correct scale, with little variation thereafter. So I suspect indeed that there is some indexing issue, but I just cannot figure out how to do this correctly. Any ideas?

Thanks a lot!

Ferdinand

Hi Ferdinand!

Did you actually try the model that I posted? Given the data that you uploaded it seems to work well (not sure though what output you expect). I actually just fixed the indexing. You can see that the model block is literally identical to the model you posted.

Defining it at the level of the observation n means that the parameters are no longer identified, since one single datapoint is not sufficient to identify a parameter.

TBH this triggers me a bit. Ignoring for a moment that 1) this is not what’s happening in the model I posted (same parameters block) and that 2) this might be a valid point in frequentist estimation, but is a kind of weird (if not wrong) way to think about Bayesian hierarchical models—this statement sound very condescending. Apologies if this was not you intention, but I see this attitude a lot from (dogmatic) Econometricians (I’m at an Econ department)—and it’s annoying. I’m actually pretty sure that you didn’t meant it that way, but I had to get this off of my chest. Sorry! haha

With that out of the way: I ran the model on the data you uploaded (it took a while) with init_r = 0.5, iter = 2000, chains = 4 and got kind of reasonable results. You should either try the centered parameterization (although I doubt it’ll do a much better job) or run the chains for a few more iterations, as I got some Bulk-ESS low warning (especially for the correlation parameters).

I’m not sure why you would want to look at Z or Z_g so I added a generated quantities block to the model.

generated quantities{
vector[N_group] rho_group;
vector[N_id] rho_id;
vector[N_group] alpha_group;
vector[N_id] alpha_id;
vector[N_group] beta_group;
vector[N_id] beta_id;

{
vector[3] dump;
for(n_g in 1:N_group){
dump = mus + diag_pre_multiply(sds_g, L_omega_g)*Z_g[n_g];
rho_group[n_g] = dump[1];
alpha_group[n_g] = exp(dump[2]);
beta_group[n_g] = exp(dump[3]);
}
for(n_id in 1:N_id){
dump = mus + diag_pre_multiply(sds, L_omega)*Z[n_id];
rho_id[n_id] = dump[1];
alpha_id[n_id] = exp(dump[2]);
beta_id[n_id] = exp(dump[3]);
}
}
}


This will compute the implied parameters for each group/id (not combinations of those though).

Here are the 95% posterior quantiles (plus posterior medians):

### For GROUPs

Seems plausible, no?

Cheers!

1 Like

Hi Max,

apologies if this sounded condescending, that was not my intention at all. I suppose I was confused by getting N theta vectors and corresponding parameters instead of just the N_id vectors I was looking for. So yes, I did run the model you proposed, but I saw some behaviour that was similar in nature to what I had seen using my original indexation (in addition to my confusion about the change of index). Your new generated quantities block is indeed illuminating in this respect.

The issue is that, given the data, I would expect to see much more change in the parameters between groups. The reason I looked at Z_g directly was to try and get at the source of what is happening, to make sure the mistake was not sneaking in somewhere else. In the matrix I posted, it seems very odd that the first three lines come in at one scale (which I might consider close to what I would have expected), and then drop by one or two orders of magnitude and change hardly at all.

This indeed seem to trickle down to the final parameter estimates at the group level as you post them. Especially for alpha_group, there appears to be hardly any change at all across groups, and beta_group seems to be only little better. If I just play around with my nonparametric data, that does not correspond to what I see. Maybe I am misunderstanding something here, too, but I thought/think this points to an estimation issue.

Well, I suppose this just means I am still much farther from what I wanted to do than I had thought, or maybe that this is not possible at all. Thank you very much for all your help–I definitely have learned quite a few things in the process.

Cheers,

Ferdinand

1 Like

The issue is that, given the data, I would expect to see much more change in the parameters between groups. The reason I looked at Z_g directly was to try and get at the source of what is happening, to make sure the mistake was not sneaking in somewhere else. In the matrix I posted, it seems very odd that the first three lines come in at one scale (which I might consider close to what I would have expected), and then drop by one or two orders of magnitude and change hardly at all.

If you look at this part of your code (posted on 8 Nov):

theta[n_id] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n_id]] + diag_pre_multiply(sds, L_omega) * Z[n_id];


this doesn’t seem so surprising to me, actually. Knowing that 1:N_id is basically iterating from 1 to 138 and then looking at

> example_data_group\$group[1:138]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[133] 3 3 3 4 4 4


shows that the loop repeatedly inserted only the Z_g's for groups 1 to 3 (and 4 a little bit). So that should explain why they are different. But I guess this difference is utterly meaningless.

This indeed seem to trickle down to the final parameter estimates at the group level as you post them.

I don’t understand what you mean by this.

Especially for alpha_group, there appears to be hardly any change at all across groups, and beta_group seems to be only little better. If I just play around with my nonparametric data, that does not correspond to what I see. Maybe I am misunderstanding something here, too, but I thought/think this points to an estimation issue.

Yeah, right, there isn’t much variation across groups. The sds_g's would also be a good summary of that (especially the first one—\sigma_{\alpha,\texttt{group}} is very low). However, this is the variation after taking into account the (considerable) variation over IDs. What are you comparing these results to and what did you expect coming out of the model? More heterogeneity across groups given heterogeneity across IDs? Less heterogeneity across IDs?

Hi Max,

this doesn’t seem so surprising to me, actually. Knowing that 1:N_id is basically iterating from 1 to 138 and then looking at

I see what you mean. Here is what I do not understand, though. Please bear with me for a minute. Here is the extract from the model you proposed to me previously for just the id level, without groups:

for (n_id in 1:N_id){
theta[n_id] = mus + diag_pre_multiply(sds, L_omega) * Z[n_id];
rho[n_id] = theta[n_id,1];
alpha[n_id] = exp(theta[n_id,2]);
beta[n_id] = exp(theta[n_id,3]);
}


Now this worked fine, and all the parameters appear to make sense. However, if we apply the same logic with the new indexing, then we should index this as theta[n] = mus + diag_pre_multiply(sds, L_omega) * Z[id[n]]; as well if we want them to loop through all the data, or not? What am I missing here, i.e. what makes the situation now so fundamentally different that we need to loop through single observations?

The other thing is that if I use the new indexing you propose, the problem seems to remain quite similar to the one I had with the original indexing. I attach the matrix of mean Z_g values coming out of this estimation output_Z_g_Max.csv (3.0 KB). You can see that again the first three rows get filled in with what I would consider plausible values, with the entries thereafter dropping by one or two orders of magnitude.

I would indeed expect the parameters at the group level to be imprecisely estimated due to the large variation across IDs. However, it seems implausible that there be virtually no variation across groups as in the graphs you posted. I suppose I would expect the variation across groups to be somewhat smaller than, but not too dissimilar from, the variation across IDs. Does that make sense?

Cheers,

Ferdinand

Hi Ferdinand!

So, in the first model (the one you quoted) the parameters only vary by ID, so we have N_id different \theta_\texttt{ID} (from which we build \rho,\alpha,\beta). Then, in the model block, we loop through all observations and assign the different IDs to the observations:

    for ( i in 1:N ) {

u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];

wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);

sigma[i] = tau * (high[i] - low[i]);

mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];

}
ce ~ normal( mu , sigma );


In the second model we have another grouping index, so it’s \theta_\texttt{ID} and \theta_\texttt{group}, which we have to assign to each observation, and instead of doing that in the model block, we do it directly in the transformed parameters block. With the non-centered approach, this can be written as \theta_n=\mu + diag(\sigma_\texttt{ID[n]})L^\Omega_\texttt{ID[n]} + diag(\sigma_\texttt{group[n]})L^\Omega_\texttt{group[n]}. So for each n we “look up” the respective \texttt{ID[n]} and \texttt{group[n]}. Then, in the model block (which now looks different!), we simply iterate over all n:

  for ( i in 1:N ) {

u_high[i] = (1 - exp(-rho[i] * high[i])) / rho[i];
u_low[i] = (1 - exp(-rho[i] * low[i])) / rho[i];

wp[i] = exp( -beta[i] * minus_log_p[i]^alpha[i]);

sigma[i] = tau * (high[i] - low[i]);

mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[i]) / -rho[i];
}

ce ~ normal( mu , sigma );


The other thing is that if I use the new indexing you propose, the problem seems to remain quite similar to the one I had with the original indexing. I attach the matrix of mean Z_g values coming out of this estimation output_Z_g_Max.csv (3.0 KB). You can see that again the first three rows get filled in with what I would consider plausible values, with the entries thereafter dropping by one or two orders of magnitude.

Hmm… that’s really weird, I’m pretty sure that I didn’t see this in my run of the model. Unfortunately I didn’t save the draws from it—but I’m pretty sure this would have popped up in the plots showing very different and extreme results in the group parameters for groups. Are you sure you didn’t accidentally run the “old” model? Because, the similarity of this “anomaly” is striking…

I would indeed expect the parameters at the group level to be imprecisely estimated due to the large variation across IDs. However, it seems implausible that there be virtually no variation across groups as in the graphs you posted. I suppose I would expect the variation across groups to be somewhat smaller than, but not too dissimilar from, the variation across IDs. Does that make sense?

I think I see what you mean. I sometimes come across situations where I find little variation across or correlation between groups, where I expected a lot more—but, I’ve come to trust the models more often than my intuition (after a lot of error checking!). The amount of “partial pooling” (as captured by \sigma_\texttt{ID}, \sigma_\texttt{group}) is letting the data speak (I don’t like that phrase, though) about an otherwise binary decision (no pooling vs. complete pooling).

That being said, \sigma_\texttt{ID}, \sigma_\texttt{group} are sort of hyperparameters and, while I found that they are usually very robust to different prior specifications, you could start thinking about capturing your prior expectation about the variability across IDs and GROUPs in the priors for \sigma_\texttt{ID}, \sigma_\texttt{group}.

You can start by simply plotting different realizations of multivariate normals for \theta (varying \sigma and L^\Omega and see how your prior expectation could look like. You could also do full blown prior predictive checks, but with such a model I think it’s hard to pin down exactly whats going on in terms of which prior implies what…

Cheers! :)
Max

Hi Max,

thank you for explaining this so patiently. I would never have figured this out. I somehow thought that since the parameters were defined at the id level in the model part, inserting the group level into the expression would pick this up. The reformulation makes perfect sense now.

I just had the model in your formulation with the new indexing converge again. Though I will still run some tests to make sure, it seems indeed that it works (it turns out I made a silly copying mistake in my previous attempt). Oddly enough, I actually do get quite a bit of variation for the alpha and beta parameters at the group level, other than suggested by the graphs you posted and as I would have expected.

Ferdinand

2 Likes

Hey Ferdinand!

thank you for explaining this so patiently. I would never have figured this out. I somehow thought that since the parameters were defined at the id level in the model part, inserting the group level into the expression would pick this up. The reformulation makes perfect sense now.

I have to add that I didn’t come up with this notation. It is used in Gelman’s et al. text books (ARM and BDA3).

I just had the model in your formulation with the new indexing converge again. Though I will still run some tests to make sure, it seems indeed that it works (it turns out I made a silly copying mistake in my previous attempt).

Happens all the time to me. That’s why I asked, haha.

Oddly enough, I actually do get quite a bit of variation for the alpha and beta parameters at the group level, other than suggested by the graphs you posted and as I would have expected.

Yeah, I ran only 2 chains with 600 iterations each. For such a model (hierarchies, non-linearities) you’d probably want a lot more, so the results I posted had to be taken with a pinch of salt (I should have stated that more clearly).

Cool, that it seems to go as you have expected! :)

Cheers!

2 Likes