Centered variance-covariance matrix for hierarchical model

Ferdinand_Vieider · November 11, 2019, 4:07pm

Hi Max,

apologies if this sounded condescending, that was not my intention at all. I suppose I was confused by getting N theta vectors and corresponding parameters instead of just the N_id vectors I was looking for. So yes, I did run the model you proposed, but I saw some behaviour that was similar in nature to what I had seen using my original indexation (in addition to my confusion about the change of index). Your new generated quantities block is indeed illuminating in this respect.

The issue is that, given the data, I would expect to see much more change in the parameters between groups. The reason I looked at Z_g directly was to try and get at the source of what is happening, to make sure the mistake was not sneaking in somewhere else. In the matrix I posted, it seems very odd that the first three lines come in at one scale (which I might consider close to what I would have expected), and then drop by one or two orders of magnitude and change hardly at all.

This indeed seem to trickle down to the final parameter estimates at the group level as you post them. Especially for alpha_group, there appears to be hardly any change at all across groups, and beta_group seems to be only little better. If I just play around with my nonparametric data, that does not correspond to what I see. Maybe I am misunderstanding something here, too, but I thought/think this points to an estimation issue.

Well, I suppose this just means I am still much farther from what I wanted to do than I had thought, or maybe that this is not possible at all. Thank you very much for all your help–I definitely have learned quite a few things in the process.

Cheers,

Ferdinand

Max_Mantei · November 11, 2019, 4:41pm

The issue is that, given the data, I would expect to see much more change in the parameters between groups. The reason I looked at Z_g directly was to try and get at the source of what is happening, to make sure the mistake was not sneaking in somewhere else. In the matrix I posted, it seems very odd that the first three lines come in at one scale (which I might consider close to what I would have expected), and then drop by one or two orders of magnitude and change hardly at all.

If you look at this part of your code (posted on 8 Nov):

theta[n_id] = mus + diag_pre_multiply(sds_g, L_omega_g) * Z_g[group[n_id]] + diag_pre_multiply(sds, L_omega) * Z[n_id];

this doesn’t seem so surprising to me, actually. Knowing that 1:N_id is basically iterating from 1 to 138 and then looking at

> example_data_group$group[1:138]
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [67] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[133] 3 3 3 4 4 4

shows that the loop repeatedly inserted only the Z_g’s for groups 1 to 3 (and 4 a little bit). So that should explain why they are different. But I guess this difference is utterly meaningless.

This indeed seem to trickle down to the final parameter estimates at the group level as you post them.

I don’t understand what you mean by this.

Especially for alpha_group, there appears to be hardly any change at all across groups, and beta_group seems to be only little better. If I just play around with my nonparametric data, that does not correspond to what I see. Maybe I am misunderstanding something here, too, but I thought/think this points to an estimation issue.

Yeah, right, there isn’t much variation across groups. The sds_g’s would also be a good summary of that (especially the first one—\sigma_{\alpha,\texttt{group}} is very low). However, this is the variation after taking into account the (considerable) variation over IDs. What are you comparing these results to and what did you expect coming out of the model? More heterogeneity across groups given heterogeneity across IDs? Less heterogeneity across IDs?

Ferdinand_Vieider · November 11, 2019, 6:50pm

Hi Max,

this doesn’t seem so surprising to me, actually. Knowing that 1:N_id is basically iterating from 1 to 138 and then looking at

I see what you mean. Here is what I do not understand, though. Please bear with me for a minute. Here is the extract from the model you proposed to me previously for just the id level, without groups:

for (n_id in 1:N_id){
    theta[n_id] = mus + diag_pre_multiply(sds, L_omega) * Z[n_id];
    rho[n_id] = theta[n_id,1];
    alpha[n_id] = exp(theta[n_id,2]);
    beta[n_id] = exp(theta[n_id,3]);
  }

Now this worked fine, and all the parameters appear to make sense. However, if we apply the same logic with the new indexing, then we should index this as theta[n] = mus + diag_pre_multiply(sds, L_omega) * Z[id[n]]; as well if we want them to loop through all the data, or not? What am I missing here, i.e. what makes the situation now so fundamentally different that we need to loop through single observations?

The other thing is that if I use the new indexing you propose, the problem seems to remain quite similar to the one I had with the original indexing. I attach the matrix of mean Z_g values coming out of this estimation output_Z_g_Max.csv (3.0 KB). You can see that again the first three rows get filled in with what I would consider plausible values, with the entries thereafter dropping by one or two orders of magnitude.

I would indeed expect the parameters at the group level to be imprecisely estimated due to the large variation across IDs. However, it seems implausible that there be virtually no variation across groups as in the graphs you posted. I suppose I would expect the variation across groups to be somewhat smaller than, but not too dissimilar from, the variation across IDs. Does that make sense?

Cheers,

Ferdinand

Max_Mantei · November 12, 2019, 2:22pm

Hi Ferdinand!

Ferdinand_Vieider:

I see what you mean. Here is what I do not understand, though. Please bear with me for a minute. Here is the extract from the model you proposed to me previously for just the id level, without groups:
for (n_id in 1:N_id){
    theta[n_id] = mus + diag_pre_multiply(sds, L_omega) * Z[n_id];
    rho[n_id] = theta[n_id,1];
    alpha[n_id] = exp(theta[n_id,2]);
    beta[n_id] = exp(theta[n_id,3]);
  }
Now this worked fine, and all the parameters appear to make sense. However, if we apply the same logic with the new indexing, then we should index this as theta[n] = mus + diag_pre_multiply(sds, L_omega) * Z[id[n]]; as well if we want them to loop through all the data, or not? What am I missing here, i.e. what makes the situation now so fundamentally different that we need to loop through single observations?

So, in the first model (the one you quoted) the parameters only vary by ID, so we have N_id different \theta_\texttt{ID} (from which we build \rho,\alpha,\beta). Then, in the model block, we loop through all observations and assign the different IDs to the observations:

    for ( i in 1:N ) {
      
        u_high[i] = (1 - exp(-rho[id[i]] * high[i])) / rho[id[i]];
        u_low[i] = (1 - exp(-rho[id[i]] * low[i])) / rho[id[i]];
        
        wp[i] = exp( -beta[id[i]] * minus_log_p[i]^alpha[id[i]]);
        
        sigma[i] = tau * (high[i] - low[i]);
        
        mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[id[i]]) / -rho[id[i]];
        
    }
    ce ~ normal( mu , sigma );

In the second model we have another grouping index, so it’s \theta_\texttt{ID} and \theta_\texttt{group}, which we have to assign to each observation, and instead of doing that in the model block, we do it directly in the transformed parameters block. With the non-centered approach, this can be written as \theta_n=\mu + diag(\sigma_\texttt{ID[n]})L^\Omega_\texttt{ID[n]} + diag(\sigma_\texttt{group[n]})L^\Omega_\texttt{group[n]}. So for each n we “look up” the respective \texttt{ID[n]} and \texttt{group[n]}. Then, in the model block (which now looks different!), we simply iterate over all n:

  for ( i in 1:N ) {
      
    u_high[i] = (1 - exp(-rho[i] * high[i])) / rho[i];
    u_low[i] = (1 - exp(-rho[i] * low[i])) / rho[i];
        
    wp[i] = exp( -beta[i] * minus_log_p[i]^alpha[i]);
        
    sigma[i] = tau * (high[i] - low[i]);
        
    mu[i] = log1m((wp[i] * u_high[i] + (1 - wp[i]) * u_low[i]) * rho[i]) / -rho[i];
    }
    
  ce ~ normal( mu , sigma );

The other thing is that if I use the new indexing you propose, the problem seems to remain quite similar to the one I had with the original indexing. I attach the matrix of mean Z_g values coming out of this estimation output_Z_g_Max.csv (3.0 KB). You can see that again the first three rows get filled in with what I would consider plausible values, with the entries thereafter dropping by one or two orders of magnitude.

Hmm… that’s really weird, I’m pretty sure that I didn’t see this in my run of the model. Unfortunately I didn’t save the draws from it—but I’m pretty sure this would have popped up in the plots showing very different and extreme results in the group parameters for groups. Are you sure you didn’t accidentally run the “old” model? Because, the similarity of this “anomaly” is striking…

I would indeed expect the parameters at the group level to be imprecisely estimated due to the large variation across IDs. However, it seems implausible that there be virtually no variation across groups as in the graphs you posted. I suppose I would expect the variation across groups to be somewhat smaller than, but not too dissimilar from, the variation across IDs. Does that make sense?

I think I see what you mean. I sometimes come across situations where I find little variation across or correlation between groups, where I expected a lot more—but, I’ve come to trust the models more often than my intuition (after a lot of error checking!). The amount of “partial pooling” (as captured by \sigma_\texttt{ID}, \sigma_\texttt{group}) is letting the data speak (I don’t like that phrase, though) about an otherwise binary decision (no pooling vs. complete pooling).

That being said, \sigma_\texttt{ID}, \sigma_\texttt{group} are sort of hyperparameters and, while I found that they are usually very robust to different prior specifications, you could start thinking about capturing your prior expectation about the variability across IDs and GROUPs in the priors for \sigma_\texttt{ID}, \sigma_\texttt{group}.

You can start by simply plotting different realizations of multivariate normals for \theta (varying \sigma and L^\Omega and see how your prior expectation could look like. You could also do full blown prior predictive checks, but with such a model I think it’s hard to pin down exactly whats going on in terms of which prior implies what…

Cheers! :)
Max

Ferdinand_Vieider · November 12, 2019, 4:41pm

Hi Max,

thank you for explaining this so patiently. I would never have figured this out. I somehow thought that since the parameters were defined at the id level in the model part, inserting the group level into the expression would pick this up. The reformulation makes perfect sense now.

I just had the model in your formulation with the new indexing converge again. Though I will still run some tests to make sure, it seems indeed that it works (it turns out I made a silly copying mistake in my previous attempt). Oddly enough, I actually do get quite a bit of variation for the alpha and beta parameters at the group level, other than suggested by the graphs you posted and as I would have expected.

Many thanks for your help!

Ferdinand

Max_Mantei · November 12, 2019, 4:59pm

Hey Ferdinand!

thank you for explaining this so patiently. I would never have figured this out. I somehow thought that since the parameters were defined at the id level in the model part, inserting the group level into the expression would pick this up. The reformulation makes perfect sense now.

I have to add that I didn’t come up with this notation. It is used in Gelman’s et al. text books (ARM and BDA3).

I just had the model in your formulation with the new indexing converge again. Though I will still run some tests to make sure, it seems indeed that it works (it turns out I made a silly copying mistake in my previous attempt).

Happens all the time to me. That’s why I asked, haha.

Oddly enough, I actually do get quite a bit of variation for the alpha and beta parameters at the group level, other than suggested by the graphs you posted and as I would have expected.

Yeah, I ran only 2 chains with 600 iterations each. For such a model (hierarchies, non-linearities) you’d probably want a lot more, so the results I posted had to be taken with a pinch of salt (I should have stated that more clearly).

Cool, that it seems to go as you have expected! :)

Cheers!

Topic		Replies	Views
FYI: how to easily toggle between centered/non-centered Modeling specification	12	1578	August 11, 2023
Adding covariance matrix to non-centered learning model Modeling cognitive-science	7	973	November 12, 2018
Proper use of sum_to_zero_vector in nested multilevel models Modeling fitting-issues , specification , matrix , cmdstanr , hierarchical-model	17	526	June 2, 2025
Poisson Hierarchical Model Non-centered Parametrization Modeling	3	772	January 12, 2022
Cross-classified Hierarchical Model, Heterogenous Variances Modeling	19	1633	August 11, 2018

Centered variance-covariance matrix for hierarchical model

Related topics