Stan model with horseshoe prior produces high Rhat

Hi everybody,

Aim
My Stan model aims to fit count data over time, taking into account a temporary increase (see figure).

Model
To capture this temporary increase, I use a random effect with horseshoe prior. I assume that counts are Poisson distributed. The model reads:

Y_i\sim \text{Poisson}(\lambda_i)

where \log\lambda_i = \mu_0 + f_i +\log P and

Y_i is the number of occurrences over time i, \lambda_i the (latent) mean number of occurences, \mu_0 the baseline log-transformed individual risk of event occurring, f_i is the change in risk over time and P is the population size (constant) over time.
In other words, I use a log link function to model the individual risk over time, which is a sum of the baseline risk \mu_0 and a random effect f_i.
Note that this example is a minimal reproducible example, which is much simpler than the one I use but which illustrates the issue.
As f_i is 0 most of the time, I use a regularized horseshoe prior to identify it, following instruction from @avehtari and @jpiironen (see Appendix C).

Simulation
To test the model, I performed a small simulation study, where I simulated count data using a Poisson distribution over time (i=1,\cdots,300). I assume a baseline risk of 1% (i.e., \mu_0 = \log(0.01) and modeled the increase in risk f_i using two sigmoid functions. I generated datasets considering three population sizes: P=10^6, 10^7, 10^8. The three simulated datasets are displayed here:

Results
I then use the same Stan model to fit the three datasets. It produces good fits for the three population sizes.

However, the fits with P=10^7 and 10^8 have Rhat values exceeding 1.05 (see “max_rhat” column):

stan4

It looks like Rhat values are increasing when we increase the population size.
For this reason, I suspect that this is due to an error in the way I provided the values of the parameters that control the horseshoe prior but I haven’t been able to fix this.
In particular, I have doubts about the \sigma^2 parameter, which represents the pseudo-variance of the distribution. In the paper, it is recommended to use the inverse of the mean number of events, when assuming a Poisson distribution. Therefore, I took \sigma^{-1} = \exp(\mu)*P. Is it a correct choice or does the high Rhat values come from this uncorrect specification?
You can find the R code here (6.6 KB) and the Stan model below.
.
Many thanks in advance,
Anthony


Stan model

functions {
}

// load data objects
data {
  //dimensions
  int N_x; //number of datapoint
  array[N_x] int x;
  array[N_x] int y;
  int pop;
  
  //Hyperprior parameters
  real p_mu0_mean;
  real p_mu0_sd;
  
  //horseshoe prior
  real < lower =0 > scale_global ; // scale for the half -t prior for tau
  real < lower =1 > nu_global ; // degrees of freedom for the half -t priors for tau
  real < lower =1 > nu_local ; // degrees of freedom for the half - t priors for lambdas
  real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
  real < lower =0 > slab_df ; // slab degrees of freedom for the regularized horseshoe
  
  int inference;
}

parameters {
  real mu0; //intercept mortality
  
  vector [N_x] f3_z;
  real < lower =0 > tau_hs; // global shrinkage parameter
  vector < lower =0 >[N_x] lambda_hs; // local shrinkage parameter
  real < lower =0 > caux_hs ;
}

transformed parameters {
  vector[N_x] f3;
  vector < lower =0 >[N_x] lambda_tilde_hs ; // ’ truncated ’ local shrinkage parameter
  real < lower =0 > c_hs; // slab scale
  real <lower=0> sigma_hs;
  array[N_x] real mu;
    
  c_hs = slab_scale * sqrt(caux_hs);
  lambda_tilde_hs = sqrt(c_hs^2 * square(lambda_hs) ./ (c_hs^2 + tau_hs^2* square(lambda_hs))); //regularized lambda
  f3 = f3_z .* lambda_tilde_hs * tau_hs; //normal(0,lambda_tilde^2*tau^2)

  //log mortality rate
  for(j in 1:N_x){
      mu[j] = mu0 + f3[j] + log(pop);
  }
  sigma_hs = 1/sqrt(exp(mu0)*pop);
}

model {
  //intercept
  mu0 ~ normal(p_mu0_mean,p_mu0_sd);
  
  f3_z ~ normal(0, 1);
  lambda_hs ~ student_t(nu_local, 0, 1); //
  tau_hs ~ student_t(nu_global, 0, scale_global * sigma_hs); //df, mean, sd, df=1 is Cauchy
  caux_hs ~ inv_gamma(0.5*slab_df, 0.5*slab_df);
  
  // likelihood             
  if(inference==1){
    for(i in 1:N_x){
      target += poisson_log_lpmf(y[i] |  mu[i]);
    }
  }
}

generated quantities {
}

R code

The code in the appendix C is using non-centered parameterization which is good when the likelihood is weak. With increasing P, the likelihood gets more informative (less noisy) and then the geometry of the non-centered posterior gets worse. You can switch to centered parameterization. See 25.7 Reparameterization | Stan User’s Guide

Thanks @avehtari !
I tried to replace the student t distribution of the local shrinkage parameters \lambda by a gamma and a normal distribution as suggested by the link you mentioned, but unfortunately, this didn’t solve the issue of high R-hat, as the simulation with P=10^8 still produces Rhat around 1.5 for some of the \lambda parameters.
Note that I did not reparametrize the global shrinkage parameter \tau, as it always has low Rhat.

Did you try to switch from the non-centered parameterization to centered parameterizaton? The linked chapter discusses the case from centered to non-centered, but you hopefully can see from that how to do it in other direction, too

The horseshoe population model is almost impossible to fit accurately due to its inherit geometry. For more discussion, including extensive experiments and some potential alternatives, see https://betanalpha.github.io/assets/case_studies/modeling_sparsity.html#2223_The_Horseshoe_Population_Model.