Centered variance-covariance matrix for hierarchical model

Ferdinand_Vieider · September 20, 2019, 8:59am

Hi all,

I am having trouble implementing a centered version of a var-cov matrix for a hierarchical model. I must be making a trivial mistake somewhere–any help would be highly appreciated!

Background: I have a highly nonlinear model, which I post below (not my real model, but a minimal example). I have been playing around with it for a while, and could not get it to perform well. However, when I use a centered implementation with individual variance parameters for a_id and b_id it works like a charm.

I suppose the issue was due to the non-centered implementation rather than to including the var-covar matrix. However, I just cannot figure out how to change my code to a centered implementation–all the examples I can find seem to on non-centered model. Any suggestions?

data{
    int<lower=1> N;
    int<lower=1> N_id;
    int id[N];
    real d[N];
    real x[N];
    real y[N];
}
parameters{
    real a;
    real b;
    real<lower=0> sigma;
    matrix[2,N_id] z_N_id;
    cholesky_factor_corr[2] L_Rho_id;
    vector<lower=0>[2] sigma_id;
}
transformed parameters{
    matrix[N_id,2] v_N_id;
    vector[N_id] a_id;
    vector[N_id] b_id;
    matrix[2,2] Rho_id;
    v_N_id = (diag_pre_multiply(sigma_id,L_Rho_id)*z_N_id)';
    a_sub = col(v_N_id,1);
    b_sub = col(v_N_id,2);
    Rho_sub = L_Rho_id * L_Rho_id';
}
model{
    vector[N] mu;
    L_Rho_id ~ lkj_corr_cholesky( 4 );
    sigma_id ~ std_normal();
    sigma ~ normal( 0.2 , 0.1 );
    b ~ normal( 0 , 0.25 );
    a ~ normal( 0 , 0.1 );
    to_vector(z_N_id) ~ std_normal();
    for ( i in 1:N ) {
        mu[i] = exp(-(a + a_id[id[i]]) * x) * (y^(b + b_id[id[i]]));
    }
    d ~ normal( mu , sigma );
}
```

Max_Mantei · September 20, 2019, 11:06am

Hi Ferdinand!

I think a centered approach to this would simply be

v_N_id ~ multi_normal_cholesky(mu, diag_pre_multiply(sigma_id, L_Rho_id));

But looking at your model, I’m not really sure if it’s doing what you think it should do. Did you mean to write

for ( i in 1:N ) {
  mu[i] = exp(-a_sub[n]*x) * (y^b_sub[n]);

or something along those lines, because in the model above, there is no hierarchical parameter going into the likelihood d ~ normal(mu, sigma). (Could also very well be, that I’m missing something.)

Ferdinand_Vieider · September 20, 2019, 1:03pm

Hi Max,

thank you very much for your help! There was indeed a mistake in the toy model I posted–the hierarchical part was not in there. Apologies. I have now corrected this in the code above.

Unfortunately there is still something I do not understand. The distribution for v_N_id you suggest supposedly replaces the prior for z_N_id. That is, I would insert

v_N_id ~ multi_normal_cholesky(mu, diag_pre_multiply(sigma_id, L_Rho_id));

in place of

to_vector(z_N_id) ~ std_normal();

I am not sure about the role of mu in your code. I suppose that this is different from the mu in my likelihood, say a vector of zeros. However, that does not seem to work. What dimension would that vector need to have given that v_N_id is a Kx2 matrix? And do I need to change something in the rest of my code as well?

Thanks!

Ferdinand

Max_Mantei · September 20, 2019, 2:09pm

Hey! Yeah, that makes sense (although there is still the a_sub in the code, which should be a_id, but I’m being pedantic here…).

The mu in the code snipped I posted above, would be just a vector of a and b, i.e. vector[2] mu = [a, b]' or something like that, because these are the centers of the multivariate normal. In the centered parameterization, you actually don’t need to include the standard normal z 's.

Consider the univariate case:

//Non-centered (in transformed parameters or model block):
for (n in 1:N_id)
  a_id[n] = a + sigma_a*z_a[n]; // with z_a ~ std_normal()
//Centered (only in model block):
for (n in 1:N_id)
  a_id[n] ~ normal(a, sigma_a); // no need for auxiliary z;
//both versions are vectorizable, but I wanted to make it a bit more clear.

So in a bi-variate case, we “simply” switch to the multi_normal or multi_normal_cholesky and add in a correlation matrix (or its cholesky factor) when we want to code it as centered.

Hm, not sure, if that makes it clear… In any case, you can also check out the Stan users guide (chapter 22.7, scroll down to Multivariate Reparameterizations).

Ferdinand_Vieider · September 20, 2019, 4:48pm

Thanks again, Max, but I still don’t get it (and I am getting sloppy it seems). So how is this:

data{
    int<lower=1> N;
    int<lower=1> N_id;
    int id[N];
    real d[N];
    real x[N];
    real y[N];
}
parameters{
    real a;
    real<lower=0> b;
    real<lower=0> sigma;
    cholesky_factor_corr[2] L_Rho_id;
    vector<lower=0>[2] sigma_id;
}
transformed parameters{
    matrix[N_id,2] v_N_id;
    vector[N_id] a_id;
    vector[N_id] b_id;
    a_id = col(v_N_id,1);
    b_id = col(v_N_id,2);
    matrix[2,2] Rho_id;
    Rho_id = L_Rho_id * L_Rho_id';
}
model{
    vector[N] mu;
    vector[2] zeros = [0 , 0]';
    L_Rho_id ~ lkj_corr_cholesky( 4 );
    sigma_id ~ std_normal( );
    sigma ~ normal( 0.2 , 0.5 );
    b ~ normal( 0 , 0.25 );
    a ~ normal( 0 , 0.1 );
    for (n in N_id)
    v_N_id[n] ~ multi_normal_cholesky(zeros ,  diag_pre_multiply(sigma_id , L_Rho_id));
    for (i in 1:N) {
        mu[i] = exp(-(a + a_id[id[i]]) * x) * (y^(b + b_id[id[i]]));
    }
    d ~ normal( mu , sigma );
}

Now, however, I get a bunch of initialization failures after 100 attempts. Starting values, restricting inti_r etc. does not help. I suppose there is still something wrong in the code, but I can still not figure out what. I read the passages of the user guide you suggested, but still could not find the problem. Any ideas?

Thanks again,

Ferdinand

Max_Mantei · September 20, 2019, 11:01pm

As far as I can tell this looks better. Good starting values are very important for non-linear models. It’s also important to impose constraints where possible, but I guess a and b can vary freely (positive/negative values)? Hm, does it help writing it as \mu_i = \exp(-ax + b\log (y)) (assuming y>0)?

Also, at least from my experience something like d_i \sim \text{N}(\exp(\mu_i), \sigma) is usually hard to fit, because homoscedasticity tends to be unusual with strictly positive (and nonlinear) \mu. Did you try something like d_i \sim \text{LogNormal}(-ax + b\log (y), \sigma)? (All of this is assuming that y>0, of course…

Ferdinand_Vieider · September 23, 2019, 3:48am

Still not there I am afraid. I have been playing around with the starting values and some changes in the way the model is written, but to no avail. I now get the following error:

Error evaluating the log probability at the initial value.
Chain 4: Exception: validate transformed params: b_id[1] is nan, but must be greater than or equal to -0.8

(the -0.8 comes from a constraint on b being positive)

Oddly, the error seems to be caused by whatever hierarchical parameter is in the second column of v_N_id, i.e. if I switch the two around it is the other parameter giving me the error. It also seems strange that the same model works well without the var-covar matrix, and even with the non-centered var-cov matrix it `works’ (except for convergence being terrible). I thus suspect that I still have a problem in the code, but I just cannot figure out where.

Max_Mantei · September 23, 2019, 12:57pm

Can you post your data (or a subsample) here so I can have a look? Also, the only constraint is b>0? Or also y>0?

Ferdinand_Vieider · September 23, 2019, 1:08pm

The one I posted was supposed to be a minimal example, the actual one is a little more complex. There are 3 hierarchical parameters, two of which are constraint to be non-negative. I attach the full model and a minimal dataset I am using to test the model.

data_example.csv (14.4 KB) full_stan_model.stan (1.5 KB)

Max_Mantei · September 23, 2019, 2:29pm

I think to ensure \alpha_{id[i]} > 0 and \beta_{id[i]} > 0 it is better to model \texttt{log_alpha}_{id[i]} directly.

Its a bit hard for me to follow all de details here, but parts of the models are super prone to blow up / collapse… for example (lines 43 +44) stuff like \exp(-\rho_{id[i]}high), where \rho is not restricted to be positive and high>20 seems strange. Should that be \log(high)?

Also, line 45, \exp(-\beta_{id[i]}(-\log p^{\alpha_{id[i]}})), seems awkward.

I tried some stuff here, but, basically at wp everything breaks down… (but you can see here, how I packed packed together rho, log_alpha, and log_beta to a new vector theta… This coding would be a non-centered parameterization though, since I usually try those first, but I could even get the model to run, because of the issues with the computations [see above]):

data{
    int<lower=1> N;
    int<lower=1> N_id;
    int id[N];
    vector[N] ce;
    vector[N] high;
    vector[N] low;
    vector[N] p;
}
parameters{
    vector[3] mus;
    real<lower=0> tau;
    vector<lower=0>[3] sigma_sub;
    cholesky_factor_corr[3] L_Rho_sub;
    vector[3] Z[N_id];
}
transformed parameters{
    vector[3] theta[N_id];
    vector[N_id] rho;
    vector<lower=0>[N_id] alpha;
    vector<lower=0>[N_id] beta;
    vector<lower=0>[N] sigma = tau * (high - low);
    vector[N] u_high;
    vector[N] u_low;
    vector[N] wp;
    vector[N] mu;
    
    for (n_id in 1:N_id){
      theta[n_id] = mus + diag_pre_multiply(sigma_sub, L_Rho_sub) * Z[n_id];
      rho[n_id] = theta[n_id, 1];
      alpha[n_id] = exp(theta[n_id, 2]);
      beta[n_id] = exp(theta[n_id, 3]);
    }
    
    u_high = (1 - exp(-(rho[id]) .* high)) ./ rho[id];
    u_low = (1 - exp(-(rho[id]) .* low)) ./ rho[id];
    
    for (i in 1:N){
      wp[i] = exp(-beta[id[i]] * (-log(p[i])^(alpha[id[i]])));
      mu[i] = (log(-(((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * (rho[id[i]])) - 1))) / (-(rho[id[i]]));
    }
    
    print("u_high:", u_high);
    print("u_low:", u_low);
    print("wp:", wp);
    print("mu:", mu);

}
model{
    mus[1] ~ normal( 0 , 0.1 ); // prior mu for rho
    mus[2] ~ normal( -0.35 , 0.1 ); // prior mu for log(alpha)
    mus[3] ~ normal( 0, 0.1); // prior mu for log(beta)
    
    tau ~ normal( 0.2 , 0.1 );

    sigma_sub ~ std_normal( );
    L_Rho_sub ~ lkj_corr_cholesky( 3 );
    for (n_id in 1:N_id)
      Z[n_id] ~ std_normal();
    
    ce ~ normal( mu , sigma );
}

Is this some model from the literature that you can link to? Some paper describing u_high, u_low, and wp?

Ferdinand_Vieider · September 23, 2019, 2:51pm

Hi Max, I appreciate all your efforts on this. Yes, it is a fairly standard model of decision making under risk, and I have used these functional forms many times. See the following link for the “awkward” function:

The issue that bothers me is that my non-centered version of the model (the first minimal example I posted above) actually runs fine even with the non-negativity constraints, except that it struggles to converge to something meaningful. Notice that to simplify, one could also define u_high = high^rho[id], and redefine high_std = high/40 and low_std = low/40. Again, I do not think the functional forms by themselves are the issue, since they work fine without the var-covar matrix.

Max_Mantei · September 23, 2019, 5:00pm

Yes, it is a fairly standard model of decision making under risk, and I have used these functional forms many times. See the following link for the “awkward” function

Sorry I didn’t mean to sound snarky. It’s just that those kind of thing run into numerical instabilities super fast, and with hierarchical models you generally need some flexibility…

Anyways, running the following with iter = 2000 and init_r = 0.5 actually worked:

data{
    int<lower=1> N;
    int<lower=1> N_id;
    int id[N];
    real ce[N];
    real high[N];
    real low[N];
    real p[N];
}
transformed data{
  real minus_log_p[N];
  
  for (i in 1:N)
    minus_log_p[i] = -log(p[i]);
}
parameters{
    vector[3] mus;
    vector<lower=0>[3] sds;
    cholesky_factor_corr[3] L_omega;
    vector[3] Z[N_id];

    real<lower=0> tau;

}
transformed parameters{
  vector[3] theta[N_id];
  vector[N_id] rho;
  vector<lower=0>[N_id] alpha;
  vector<lower=0>[N_id] beta;

  for (n_id in 1:N_id){
    theta[n_id] = mus + diag_pre_multiply(sds, L_omega) * Z[n_id];
    rho[n_id] = theta[n_id,1];
    alpha[n_id] = exp(theta[n_id,2]);
    beta[n_id] = exp(theta[n_id,3]);
  }
  
}
model{
    vector[N] sigma;
    vector[N] wp;
    vector[N] u_low;
    vector[N] u_high;
    vector[N] mu;
    
    sds ~ exponential(5);
    L_omega ~ lkj_corr_cholesky(4);
    mus[1] ~ normal(0, 0.1);
    mus[2] ~ normal(-0.35, 0.1);
    mus[3] ~ normal(0, 0.1);
    
    for (n_id in 1:N_id)
      Z[n_id] ~ std_normal();

    tau ~ normal( 0.2 , 0.1 );
    
    
    
    for ( i in 1:N ) {
      
        u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
        u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];
        
        wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
        
        sigma[i] = tau * (high[i] - low[i]);
        
        mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];
        
    }
    ce ~ normal( mu , sigma );
}

Max_Mantei · September 25, 2019, 10:58am

@Ferdinand_Vieider die this work for you as well? Otherwise I can also post the centered version of the code above, if you need it.

Cheers!

Ferdinand_Vieider · September 25, 2019, 11:18am

Hi Max, sorry, I got somewhat side-tracked.

The model works very well in the version you posted–thanks a lot! I had some difficulties interpreting it at first, but this was just due to the re-definition of some parameters in your model that I missed initially. I got it now (hence the edit of my original reply).

Nevertheless, if you also have a centered version it would be great if you could post that. I would be very keen on seeing how that works, and on running some comparisons between the two models to see how their performance compares on my data.

Cheers,
Ferdinand

Max_Mantei · September 25, 2019, 3:41pm

Hm, yeah, I should have commented that more thoroughly. Sorry.

My approach to a centered version would be this:

Centered model code

data{
    int<lower=1> N;
    int<lower=1> N_id;
    int id[N];
    real ce[N];
    real high[N];
    real low[N];
    real p[N];
}

transformed data{
  real minus_log_p[N];
  vector[N] difference;
  
  for (i in 1:N){
    minus_log_p[i] = -log(p[i]);
    difference[i] = high[i] - low[i];
  }
}

parameters{
    vector[3] mus;
    vector<lower=0>[3] sds;
    cholesky_factor_corr[3] L_omega;
    vector[3] theta[N_id];

    real<lower=0> tau;

}

transformed parameters{

}

model{
    vector[N] sigma = tau * difference;
    vector[N] wp;
    vector[N] u_low;
    vector[N] u_high;
    vector[N] mu;
    vector[N_id] rho;
    vector[N_id] alpha;
    vector[N_id] beta;
  
    sds ~ exponential(5);
    L_omega ~ lkj_corr_cholesky(4);
    mus[1] ~ normal(0, 0.1);
    mus[2] ~ normal(-0.35, 0.1);
    mus[3] ~ normal(0, 0.1);
    
    theta ~ multi_normal_cholesky(mus, diag_pre_multiply(sds, L_omega));
    
    for (n_id in 1:N_id){
      rho[n_id] = theta[n_id,1];
      alpha[n_id] = exp(theta[n_id,2]);
      beta[n_id] = exp(theta[n_id,3]);
    }
    
    tau ~ normal( 0.2 , 0.1 );
    
    for ( i in 1:N ) {
      
      u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
      u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];
        
      wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
      mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];
        
    }
    
    ce ~ normal(mu, sigma);
    
}
generated quantities{
    vector[N_id] rho;
    vector[N_id] alpha;
    vector[N_id] beta;
    
    for (n_id in 1:N_id){
      rho[n_id] = theta[n_id,1];
      alpha[n_id] = exp(theta[n_id,2]);
      beta[n_id] = exp(theta[n_id,3]);
    }
}

Running this I got some warnings:

Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-essTail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

But the results are basically identical. The centered versions has slight problems recovering the standard deviations of theta:

Centered model results

> print(posterior_hierarchical_centered, c("mus", "sds", "L_omega"))
Inference for Stan model: full_stan_model_hierarchical_centered.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mus[1]        0.02    0.00 0.01  0.00  0.01  0.02  0.03  0.04  1729 1.00
mus[2]       -0.34    0.00 0.07 -0.47 -0.38 -0.34 -0.29 -0.20  1780 1.00
mus[3]        0.01    0.00 0.08 -0.15 -0.04  0.01  0.06  0.16  3019 1.00
sds[1]        0.06    0.00 0.01  0.04  0.05  0.05  0.06  0.08   788 1.00
sds[2]        0.36    0.00 0.10  0.17  0.29  0.35  0.42  0.56   448 1.01
sds[3]        0.70    0.00 0.10  0.51  0.63  0.69  0.76  0.92  1370 1.00
L_omega[1,1]  1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_omega[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[1,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[2,1] -0.27    0.01 0.21 -0.66 -0.42 -0.27 -0.12  0.17  1146 1.00
L_omega[2,2]  0.94    0.00 0.07  0.75  0.91  0.96  0.99  1.00  1060 1.00
L_omega[2,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[3,1] -0.52    0.00 0.15 -0.75 -0.62 -0.53 -0.43 -0.19  1337 1.00
L_omega[3,2] -0.27    0.00 0.18 -0.60 -0.40 -0.28 -0.16  0.10  1548 1.00
L_omega[3,3]  0.77    0.00 0.12  0.51  0.69  0.78  0.86  0.96  1269 1.00

Samples were drawn using NUTS(diag_e) at Wed Sep 25 17:27:48 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

While the non-centered model had slight problems recovering the correlation coefficients (this model however did not throw errors):

Non-centered model results

> print(posterior_hierarchical, c("mus", "sds", "L_omega"))
Inference for Stan model: full_stan_model_hierarchical.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mus[1]        0.02    0.00 0.01  0.00  0.01  0.02  0.03  0.04  2933 1.00
mus[2]       -0.33    0.00 0.07 -0.47 -0.38 -0.33 -0.29 -0.20  5147 1.00
mus[3]        0.01    0.00 0.08 -0.15 -0.05  0.01  0.06  0.16  4983 1.00
sds[1]        0.06    0.00 0.01  0.04  0.05  0.06  0.06  0.08  2859 1.00
sds[2]        0.35    0.00 0.10  0.14  0.28  0.35  0.41  0.55  1396 1.00
sds[3]        0.70    0.00 0.10  0.52  0.63  0.70  0.77  0.92  1301 1.01
L_omega[1,1]  1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_omega[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[1,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[2,1] -0.26    0.00 0.22 -0.67 -0.41 -0.26 -0.10  0.19  3294 1.00
L_omega[2,2]  0.94    0.00 0.07  0.74  0.91  0.96  0.99  1.00  2781 1.00
L_omega[2,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_omega[3,1] -0.52    0.00 0.14 -0.77 -0.62 -0.53 -0.44 -0.20   956 1.00
L_omega[3,2] -0.27    0.01 0.18 -0.60 -0.40 -0.28 -0.16  0.13   428 1.02
L_omega[3,3]  0.77    0.01 0.12  0.52  0.69  0.78  0.85  0.96   495 1.01

Samples were drawn using NUTS(diag_e) at Wed Sep 25 17:26:12 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The choice of parameterization also depends on the data that you have, so I can’t say, which model is more appropriate for your full data set – but you can try both.

Cheers!

Ferdinand_Vieider · September 25, 2019, 4:14pm

This is great, thanks! I will make sure to try out both.

Ferdinand_Vieider · November 8, 2019, 8:16am

Hi Max,

I would like to ask a follow-up question to this post if I may. Let me know if I should rather open a new question for it and I will do so.

I have done quite a bit of testing using the non-centered model you proposed above, and it works very well. However, now I would like to add an additional level to the hierarchy. I thought something like this ought to do the trick:

for (n_id in 1:N_id){
    theta[n_id] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n_id]] + diag_pre_multiply(sds, L_omega) * Z[n_id];
    rho[n_id] = theta[n_id,1];
    alpha[n_id] = exp(theta[n_id,2]);
    beta[n_id] = exp(theta[n_id,3]);
  }

with an additional prior

for (n_g in 1:N_group)
      Z_g[n_g] ~ std_normal();

and the variables defined much as before. It should then be possible to recover the group-level parameters in the generated quantities block.

The issue is, this model does not seem to work. While it still converges quite happily and all looks good at first, it turns out that the model only fills in the first 6-7 entries in Z_g, after which all the values are virtually the same. The lower level entries in Z and the parameters who, alpha, beta still look fine, but somehow the model seems to` ignore’ the group level. Is there something obvious I am missing here? Here comes a complete example:

data{
    int<lower=1> N;
    int<lower=1> N_id;
    int<lower=1> N_group;
    int id[N];
    int group[N];
    real ce[N];
    real high[N];
    real low[N];
    real p[N];
}
transformed data{
  real minus_log_p[N];
  for (i in 1:N)
    minus_log_p[i] = -log(p[i]);
}
parameters{
    vector[3] mus;
    vector<lower=0>[3] sds;
    vector<lower=0>[3] sds_g;
    cholesky_factor_corr[3] L_omega;
    cholesky_factor_corr[3] L_omega_g;
    vector[3] Z[N_id];
    vector[3] Z_g[N_group];
    real<lower=0> tau;
}
transformed parameters{
  vector[3] theta[N_id];
  vector[N_id] rho;
  vector<lower=0>[N_id] alpha;
  vector<lower=0>[N_id] beta;
  for (n_id in 1:N_id){
    theta[n_id] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n_id]] + diag_pre_multiply(sds, L_omega) * Z[n_id];
    rho[n_id] = theta[n_id,1];
    alpha[n_id] = exp(theta[n_id,2]);
    beta[n_id] = exp(theta[n_id,3]);
  }
}
model{
    vector[N] sigma;
    vector[N] wp;
    vector[N] u_low;
    vector[N] u_high;
    vector[N] mu;
    sds ~ exponential(5);
    sds_g ~ exponential(5);
    L_omega ~ lkj_corr_cholesky(4);
    L_omega_g ~ lkj_corr_cholesky(4);
    mus[1] ~ normal(0, 0.1);
    mus[2] ~ normal(-0.35, 0.1);
    mus[3] ~ normal(0, 0.1);
 for (n_id in 1:N_id)
      Z[n_id] ~ std_normal();
 for (n_g in 1:N_group)
      Z_g[n_g] ~ std_normal();
    tau ~ normal( 0.2 , 0.1 );
    for ( i in 1:N ) {   
        u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
        u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];   
        wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
        sigma[i] = tau * (high[i] - low[i]);
        mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];  
    }
    ce ~ normal( mu , sigma );
}
generated quantities{
    vector[3] theta_g[N_group];
    vector[N_group] rho_g;
    vector[N_group] alpha_g;
    vector[N_group] beta_g;
   for (n_g in 1:N_group){
    theta_g[n_g] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[n_g];
    rho_g[n_g] = theta_g[n_g,1];
    alpha_g[n_g] = exp(theta_g[n_g,2]); 
    beta_g[n_g] = exp(theta_g[n_g,3]); 
  } 
}

Any ideas what I am doing wrong here?

Cheers,
Ferdinand

Max_Mantei · November 9, 2019, 3:48pm

I think this is a simple indexing error.

This should work:

data{
  int<lower=1> N;
  int<lower=1> N_id;
  int<lower=1> N_group;
  int id[N];
  int group[N];
  real ce[N];
  real high[N];
  real low[N];
  real p[N];
}
transformed data{
  real minus_log_p[N];
  
  for (i in 1:N)
    minus_log_p[i] = -log(p[i]);
}
parameters{
  vector[3] mus;
  vector<lower=0>[3] sds;
  vector<lower=0>[3] sds_g;

  cholesky_factor_corr[3] L_omega;
  cholesky_factor_corr[3] L_omega_g;

  vector[3] Z[N_id];
  vector[3] Z_g[N_group];


  real<lower=0> tau;

}
transformed parameters{
  vector[3] theta[N];
  vector[N] rho;
  vector<lower=0>[N] alpha;
  vector<lower=0>[N] beta;

  for (n in 1:N){
    theta[n] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n]] + diag_pre_multiply(sds, L_omega) * Z[id[n]];;
    rho[n] = theta[n,1];
    alpha[n] = exp(theta[n,2]);
    beta[n] = exp(theta[n,3]);
  }
  
}
model{
  vector[N] sigma;
  vector[N] wp;
  vector[N] u_low;
  vector[N] u_high;
  vector[N] mu;
    
  sds ~ exponential(5);
  sds_g ~ exponential(5);
  L_omega ~ lkj_corr_cholesky(4);
  L_omega_g ~ lkj_corr_cholesky(4);
  mus[1] ~ normal(0, 0.1);
  mus[2] ~ normal(-0.35, 0.1);
  mus[3] ~ normal(0, 0.1);
    
  for (n_id in 1:N_id)
    Z[n_id] ~ std_normal();
    
  for (n_g in 1:N_group)
    Z_g[n_g] ~ std_normal();

  for ( i in 1:N ) {
      
    u_high[i] = (1 - exp(-rho[i] * high[i])) / rho[i];
    u_low[i] = (1 - exp(-rho[i] * low[i])) / rho[i];
        
    wp[i] = exp( -beta[i] * minus_log_p[i]^alpha[i]);
        
    sigma[i] = tau * (high[i] - low[i]);
        
    mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[i]) / -rho[i];
    }
    
  ce ~ normal( mu , sigma );
}

Let me know if it does(n’t) work!

Cheers! :)
Max

Ferdinand_Vieider · November 10, 2019, 8:12am

Hi Max,

thank you for the suggestion, but this does not seem to be it. In the model you proposed on September 23–and on which I based my modification above–the theta vector is defined at the level of the id identifier, i.e. theta[n_id]. Defining it at the level of the observation n means that the parameters are no longer identified, since one single datapoint is not sufficient to identify a parameter.

I attach a minimal dataset with the group identifier, so that you can see the structure of the data: example_data_group.csv (81.4 KB) . I also attach the output I obtain using these data for Z_g from running the model I posted above: output_Z_g.csv (3.0 KB).

The first three parameter vectors/rows of Z_g would appear to be on the correct scale, with little variation thereafter. So I suspect indeed that there is some indexing issue, but I just cannot figure out how to do this correctly. Any ideas?

Thanks a lot!

Ferdinand

Max_Mantei · November 11, 2019, 3:00pm

Hi Ferdinand!

Did you actually try the model that I posted? Given the data that you uploaded it seems to work well (not sure though what output you expect). I actually just fixed the indexing. You can see that the model block is literally identical to the model you posted.

Defining it at the level of the observation n means that the parameters are no longer identified, since one single datapoint is not sufficient to identify a parameter.

TBH this triggers me a bit. Ignoring for a moment that 1) this is not what’s happening in the model I posted (same parameters block) and that 2) this might be a valid point in frequentist estimation, but is a kind of weird (if not wrong) way to think about Bayesian hierarchical models—this statement sound very condescending. Apologies if this was not you intention, but I see this attitude a lot from (dogmatic) Econometricians (I’m at an Econ department)—and it’s annoying. I’m actually pretty sure that you didn’t meant it that way, but I had to get this off of my chest. Sorry! haha

With that out of the way: I ran the model on the data you uploaded (it took a while) with init_r = 0.5, iter = 2000, chains = 4 and got kind of reasonable results. You should either try the centered parameterization (although I doubt it’ll do a much better job) or run the chains for a few more iterations, as I got some Bulk-ESS low warning (especially for the correlation parameters).

I’m not sure why you would want to look at Z or Z_g so I added a generated quantities block to the model.

generated quantities{
  vector[N_group] rho_group;
  vector[N_id] rho_id;
  vector[N_group] alpha_group;
  vector[N_id] alpha_id;
  vector[N_group] beta_group;
  vector[N_id] beta_id;
  
  {
    vector[3] dump;
    for(n_g in 1:N_group){
      dump = mus + diag_pre_multiply(sds_g, L_omega_g)*Z_g[n_g];
      rho_group[n_g] = dump[1];
      alpha_group[n_g] = exp(dump[2]);
      beta_group[n_g] = exp(dump[3]);
    }
    for(n_id in 1:N_id){
      dump = mus + diag_pre_multiply(sds, L_omega)*Z[n_id];
      rho_id[n_id] = dump[1];
      alpha_id[n_id] = exp(dump[2]);
      beta_id[n_id] = exp(dump[3]);
    }
  }
}

This will compute the implied parameters for each group/id (not combinations of those though).

Here are the 95% posterior quantiles (plus posterior medians):

For IDs (it’s a bit small…)

For GROUPs

Seems plausible, no?

Cheers!

Topic		Replies	Views
Non-centered parameterization of a hierarchical linear model Modeling	3	1197	May 20, 2020
Issues with non-centered specification (Matt trick) Modeling fitting-issues	0	461	January 7, 2019
Problems with non-centered variance parameters (Matt tick) Modeling performance	0	340	January 6, 2019
Non-centered hierarchical skew normal implementation Modeling fitting-issues	0	526	January 14, 2019
Centered vs non-centered model, different loglikelihoods Modeling specification	6	590	May 15, 2024

Centered variance-covariance matrix for hierarchical model

Centered model code

Centered model results

Non-centered model results

For IDs (it’s a bit small…)

For GROUPs

Related topics