Using Hierarchical mean estimates as predictors

Hi everyone,

I’m trying to build a model that propagates uncertainty across several linked components (possibly something like a Bayesian path model / SEM). Below is a simplified version of the full model, which is behaving in some unexpected ways.

Data structure:

Site (gridref)

Site-year (gridref-year)

Transects within each site-year

I want to test how species richness predicts total abundance, separating:

  1. Between-site differences in richness

  2. Within-site (year-to-year) variation in richness

  3. The interaction between these two.

In a later stage of the model, I want to propagate these estimates forward to predict other responses (tick abundance and Borrelia prevalence), but I’ve removed those parts here for simplicity.

First, I decompose richness into site and site-year components using random effects. I then use estimated site-level richness, site-year deviation from that richness, and their interaction as predictors of abundance, including their interaction.

The model runs without obvious diagnostics issues (no divergences, good Rhat), but there’s a posterior ridge between the intercepts for the richness and abundance components and when I add the downstream response variables (ticks and Borrelia prevalence), the geometry becomes more problematic and sampling gets ugly.

I tried to scale the site-level and site-year level richness estimates in the transformed parameters block but that also makes things worse!

What might explain this posterior ridge between the intercepts?

Is this a known identifiability issue when propagating latent predictors between model components?

Is there a better modeling strategy I should consider?

Any advice would be greatly appreciated.

data {
  
  int n_bbs_tr_v ;                            // number of transect visit's 
  int gr_yrs ;                                // number of gridref year's 
  int grs ;                                   // number of grid refferences (sites)   
  int gr_for_gr_yr[gr_yrs] ;                  // gridref to which each gridref year belongs
  int gr_yr_for_tr_v[n_bbs_tr_v] ;            // gridref year to which each transect visit belongs (only those sampled in the bbs)
  int rich [n_bbs_tr_v] ;                     // observed species richness (transect level)
  real visit[n_bbs_tr_v] ;                    // was the visit early or late in the season (--0.5 or 0.5)
  int babnd [n_bbs_tr_v] ;                    // total abundance of birds
  
}

parameters{
  
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  // MODEL A - estimating gridref-year richness ////////////////////////////////
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  
  real rich_alpha ;                            // global mean richness
  real rich_beta_visit ;                       // visit time of year coefficient  
  
  vector[gr_yrs] rich_gr_yr_raw;               // raw random effect 
  real<lower = 0> rich_gr_yr_sigma;            // gr_yr random effect scale (multiple transects within a gr_yr) 
  
  vector[grs] rich_gr_raw;                     // gr raw random effect 
  real<lower = 0> rich_gr_sigma;               // gr random effect scale (multiple gr_yrs (years) within a gr (a site))
  
  // //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  // // MODEL B2 - estimating gridref-year bird abundance /////////////////////////
  // //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//

  real babnd_alpha ;                                 // community competence global intercept
  real babnd_beta_visit ;

  
  real babnd_rich_mean ;                           // effect of gr (site) level richness (latent) on bird abundance
  real babnd_rich ;                                // effect of deviation from gr (site level mean - latent) on bird abundance 
  real babnd_rm_int ;                              // an interaction between the two (latent * latent)


  vector[gr_yrs] babnd_gr_yr_raw ;                 // the same random effect structure as above to capture variation not explained by richness
  real<lower = 0> babnd_sigma_gr_yr ;              // i.e site level processes other than richness could influence total abundance 
  vector[grs] babnd_gr_raw ;                       // (richness is just a proxy for some unobserved process i.e habitat degredation or something like that ...)
  real<lower = 0> babnd_gr_sigma ;

}

transformed parameters{
  
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  // MODEL A - estimating gridref-year richness (deviation) and mean richness //
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  
  vector[grs] rich_gr = rich_gr_raw * rich_gr_sigma ;
  
  vector[gr_yrs] rgr_yr = 
  rich_gr_yr_raw * rich_gr_yr_sigma;

  vector[gr_yrs] rich_beta_gr_yr =
  rgr_yr + 
  rich_gr[gr_for_gr_yr] ;
  
  // extract mean gridref richness estimates 
  vector[gr_yrs] r_gr_mean; 
  for (i in 1:gr_yrs) {                                                         
    int g = gr_for_gr_yr[i];
    r_gr_mean[i] = rich_gr[g];
  }
 
  vector[gr_yrs] r_int = rgr_yr .* r_gr_mean ; // pre compute interaction term
  
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  // MODEL B2 - estimating gridref-year bird abundance /////////////////////////
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//

  vector[grs] babnd_gr ;
  babnd_gr = babnd_gr_raw * babnd_gr_sigma ;

  vector[gr_yrs] babnd_beta_gr_yr =                   // site-year level bird abundance is a product of ...
  babnd_gr_yr_raw * babnd_sigma_gr_yr +              // a site-year random effect    
  babnd_gr[gr_for_gr_yr] +                           // a site random effect  
  babnd_rich_mean * r_gr_mean +                     // site level mean richness 
  babnd_rich * rgr_yr +                              // deviance from site-level mean richness
  babnd_rm_int * r_int;                              // and interaction between the two

}

model {
  
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  // MODEL A - estimating gridref-year richness ////////////////////////////////
  //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//

  rich_gr_yr_raw ~ normal(0, 1) ;
  rich_gr_yr_sigma ~ exponential(2) ;
  
  rich_gr_raw ~ normal(0, 1) ;
  rich_gr_sigma ~ exponential(2) ;

  for(i in 1:n_bbs_tr_v) {
    real rich_lambda = exp(
      rich_alpha +
      rich_beta_gr_yr[gr_yr_for_tr_v[i]] +
      rich_beta_visit * visit[i]
    ) ;
    rich[i] ~ poisson(rich_lambda) ;
  };

  rich_alpha ~ normal(0, 0.5) ;
  rich_beta_visit ~ normal(0, 0.5) ;
  
  // //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//
  // // MODEL B2 - estimating gridref-year bird abundance /////////////////////////
  // //AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//

  babnd_alpha ~ normal(0, 0.5);
  babnd_beta_visit ~ normal(0, 0.5);

  babnd_rich ~ normal(0, 0.5) ;
  babnd_rich_mean ~ normal(0, 0.5) ;
  babnd_rm_int ~ normal(0, 0.5) ;

  babnd_gr_raw ~ normal(0, 1) ;
  babnd_gr_sigma ~ exponential(2) ;
  babnd_gr_yr_raw ~ normal(0, 1) ;
  babnd_sigma_gr_yr ~ exponential(2) ;

  // Transect-level model
  for(i in 1:n_bbs_tr_v) {
      real babnd_mu = exp(
        babnd_alpha +
        babnd_beta_visit * visit[i] +
        babnd_beta_gr_yr[gr_yr_for_tr_v[i]]
        );
        babnd[i] ~ poisson(babnd_mu);
  }

} 

What’s a posterior ridge? Can you provide a definition and/or reference? There’s a generated quantities block if you want to make predictions. Or for model validation, whatever.

By posterior ridge I mean the positive correlation between posterior parameter estimates. In this case ‘babnd_alpha’ and ‘rich_alpha’ which are parameters for the intercepts of the abundance and richness models respectively. Top left plot in the figure I shared. As far as I understand, these ridges make it hard for the sampler to explore certain parts of posterior space… ‘bad model geometry’ etc.

I don’t think the generated quantities block will be very helpful in this case.

1 Like

Oh, snap, I see what you’re talking about, top left. I had never heard this term before.

Here’s what claude said, I have no idea…

This is a great question — and a really common pain point in multi-component latent variable models in Stan. Let me break down what’s likely happening and what you can do about it.


What’s causing the posterior ridge between intercepts?

The core issue is non-identifiability between the global intercept in Model A (richness) and the intercept in your abundance model. Here’s the intuition:

If you shift rich_alpha up by some amount δ, and shift the abundance intercept down by a compensating amount, the likelihood is unchanged. The model can’t tell these two configurations apart — hence a ridge (or banana-shaped posterior) rather than a clean peak.

This gets worse when you add downstream components (ticks, Borrelia), because now you have chains of intercepts that can all slide against each other.


Is this a known identifiability issue with latent predictors?

Yes, very much so. It’s essentially the same problem as in factor models and SEMs: when a latent variable feeds into multiple outcomes, the scale and location of the latent variable is only identified by the constraints you impose. Specifically:

  • Location: the mean of a latent predictor is absorbed into the intercept of whatever outcome it predicts
  • Scale: the variance of a latent predictor trades off against regression coefficients

When you compute rich_gr and rich_gr_yr as random effects and pass them forward as predictors, you’re passing unanchored quantities into the abundance model.


Why did scaling in transformed parameters make it worse?

Scaling latent quantities by their posterior standard deviation (i.e. dividing by rich_gr_sigma) can actually increase geometry problems because the scale parameter itself is uncertain and correlated with everything else. You end up with a more curved, higher-dimensional ridge.


Recommended fixes

Here’s a hierarchy of approaches, from quick to more structural:

1. Pin the richness intercept (quickest fix) Give rich_alpha a tight, informative prior centered on a sensible value (e.g. the observed mean log-richness). This anchors the location and breaks the ridge without restructuring the model.

stan

rich_alpha ~ normal(observed_mean_log_rich, 0.5);

2. Center and scale the latent predictors before passing them forward

Rather than scaling by the inferred sigma (which is uncertain), subtract the prior mean (zero for random effects) and divide by a fixed constant based on domain knowledge or pilot data:

stan

transformed parameters {
  vector[grs] rich_gr_centered;
  rich_gr_centered = rich_gr - mean(rich_gr);  // removes floating mean
}

This removes the location freedom explicitly.

3. Identify the model structurally (most robust)

The cleanest solution for a multi-component model like this is to sum-to-zero constrain your site-level random effects, which pins the location of the latent variable:

stan

// Use a sum-to-zero parameterization
rich_gr ~ normal(0, rich_gr_sigma);
// And add a soft constraint:
sum(rich_gr) ~ normal(0, 0.001 * grs);

This is analogous to how SEMs handle latent variable identification — you fix one loading or the mean to resolve the trade-off.

4. Rethink the intercept structure across components

Instead of having a free intercept in both the richness model and the abundance model, consider removing the global intercept from the abundance model and instead letting the richness-derived prediction carry the mean. This reduces the number of “floating” parameters by one per downstream component.


On the interaction term

The between × within interaction is particularly prone to collinearity issues because rich_gr and rich_gr_yr are already partially confounded (site years are nested in sites, so the site mean informs the site-year deviation). You might want to verify that the interaction is adding real predictive value before debugging its geometry — sometimes it’s the interaction term that’s widening the ridge.


Summary table

Problem Cause Fix
Intercept ridge (A↔B) Unanchored latent location Pin rich_alpha or center rich_gr
Worse with scaling Sigma uncertainty propagates Use fixed centering constant
Worse with ticks/Borrelia Intercept chains multiply Sum-to-zero constraints per component
Interaction geometry Between/within collinearity Check if interaction is needed

Would it help if I sketched out a restructured Stan model block incorporating these fixes? I can also visualize the identifiability structure of your model as a path diagram if that would help you reason about where the degrees of freedom are floating.

While it’s good practice to flag LLM contributions to posts (appreciate it!), I think it’s generally discouraged to post replies that consist entirely of unvetted/unverified LLM output.

I’d say it’s fair to assume that any user of this forum has access to an LLM, and questions posted here are often looking for a level of insight or reliability that the user feels LLMs do not currently offer.

3 Likes

I was indeed hoping for insights beyond what LLM’s currently offer.

do you have any recommendations @jsocolar ?? I’d seen you’re quite active on the forum

Do you think you could write out with math or regression-formula style notation what the model you intend to fit here is?

Wait, I just compared claude generated GP regression with the stuff I built and it’s slop. Are you guys making fun of me? It’s garbage. I did an AI interview, and I’m like, this isn’t valid stan code…

Here: I don't know, I'm still beating Claude, my coding style is cleaner and I'm using pre-existing functions (yes, Stan has covariance matrices, matern kernels, whatever, and cholesky decomposition built… | Andre Marino Zapico