Curious if anyone has noticed this before and if there’s a known solution of some sort. I’ve been playing with adding partial-pooling of the variances in hierarchical models (typical implementations like SUG1.13 leave these variances unpooled) and I consistently encounter a few post-warmup divergences even with very long warmup.

For context here’s a simple hierarchical model with the pooled-variances part highlighted through comments:

```
data{
// nI: number of individuals
int<lower=2> nI ;
// nXc: number of condition-level predictors
int<lower=2> nXc ;
// rXc: number of rows in the condition-level predictor matrix
int<lower=nXc> rXc ;
// Xc: condition-level predictor matrix
matrix[rXc,nXc] Xc ;
// iXc: which individual is associated with each row in Xc
array[rXc] int<lower=1,upper=nI> iXc ;
// nY: num entries in the observation vector
int<lower=nI> nY ;
// Y: observations
vector[nY] Y ;
// yXc: which row in Xc is associated with each observation in Y
array[nY] int<lower=1,upper=rXc> yXc ;
}
transformed data{
// tXc: transposed copy of Xc
matrix[nXc,rXc] tXc = transpose(Xc) ;
}
parameters{
// locat_coef: coefficients associated with location of the Gaussian likelihood
vector[nXc] locat_coef ;
// indiv_locat_coef_sd: magnitude of variability among individuals within a group
vector<lower=0>[nXc] indiv_locat_coef_sd ;
// NOTE: indiv_locat_coef_sd is usually modelled in an unpooled manner whereby
// the individual elements don't mutually inform, ex:
// indiv_locat_coef_sd ~ std_normal() ;
// To partially-pool, we could do instead:
// indiv_locat_coef_sd ~ normal(m,s) ;
// Or maybe:
// indiv_locat_coef_sd ~ lognormal(lm,ls) ;
// Thus need two more parameters:
// mean_indiv_locat_coef_sd: mean of the normal population from which the
// values in indiv_locat_coef_sd are considered samples
real<lower=0> mean_indiv_locat_coef_sd ;
// mean_indiv_locat_coef_sd: sd of the normal population from which the
// values in indiv_locat_coef_sd are considered samples
real<lower=0> sd_indiv_locat_coef_sd ;
// helper_indiv_locat_coef: helper-variable for non-centered parameterization of indiv_locat_coef
matrix[nXc,nI] helper_indiv_locat_coef ;
// noise: SD of Gaussian likelihood
real<lower=0> noise ;
}
model{
////
// group-level structure
////
// standard-normal priors on all group-level coefficients
locat_coef ~ normal(0,10) ;
noise ~ weibull(2,1) ;
////
// individual-level structure
////
// priors for the hyper-parameters for partial-pooling of indiv_locat_coef_sd
mean_indiv_locat_coef_sd ~ std_normal() ;
sd_indiv_locat_coef_sd ~ std_normal() ;
// both normal & lognormal priors for indiv_locat_coef_sd seem to yield
// few-but-consistent divergences
indiv_locat_coef_sd ~ normal(mean_indiv_locat_coef_sd,sd_indiv_locat_coef_sd) ;
// indiv_locat_coef_sd ~ lognormal(mean_indiv_locat_coef_sd,sd_indiv_locat_coef_sd) ;
// indiv_locat_coef_ must have a standard-normal prior
to_vector(helper_indiv_locat_coef) ~ std_normal() ;
////
// observation-level structure
////
// observations as normal with common error variability and varying mean
Y ~ normal(
// in-place computation of the Gaussian locations implied by the coeffi-
// cients & contrasts associated with each observation.
columns_dot_product(
(
rep_matrix(locat_coef,nI)
+ (
rep_matrix(indiv_locat_coef_sd,nI)
.* helper_indiv_locat_coef
)
)[,iXc]
, tXc
)[yXc]
, noise
) ;
}
```

As the comments note, I seem to encounter divergences regardless of whether the partial-pooling is applied on the normal or log-normal scales. Any thoughts?