Stan model with horseshoe prior produces high Rhat

anthony_462 · February 15, 2023, 6:12pm

Hi everybody,

Aim
My Stan model aims to fit count data over time, taking into account a temporary increase (see figure).

Model
To capture this temporary increase, I use a random effect with horseshoe prior. I assume that counts are Poisson distributed. The model reads:

Y_i\sim \text{Poisson}(\lambda_i)

where \log\lambda_i = \mu_0 + f_i +\log P and

Y_i is the number of occurrences over time i, \lambda_i the (latent) mean number of occurences, \mu_0 the baseline log-transformed individual risk of event occurring, f_i is the change in risk over time and P is the population size (constant) over time.
In other words, I use a log link function to model the individual risk over time, which is a sum of the baseline risk \mu_0 and a random effect f_i.
Note that this example is a minimal reproducible example, which is much simpler than the one I use but which illustrates the issue.
As f_i is 0 most of the time, I use a regularized horseshoe prior to identify it, following instruction from @avehtari and @jpiironen (see Appendix C).

Simulation
To test the model, I performed a small simulation study, where I simulated count data using a Poisson distribution over time (i=1,\cdots,300). I assume a baseline risk of 1% (i.e., \mu_0 = \log(0.01) and modeled the increase in risk f_i using two sigmoid functions. I generated datasets considering three population sizes: P=10^6, 10^7, 10^8. The three simulated datasets are displayed here:

Results
I then use the same Stan model to fit the three datasets. It produces good fits for the three population sizes.

However, the fits with P=10^7 and 10^8 have Rhat values exceeding 1.05 (see “max_rhat” column):

stan4

It looks like Rhat values are increasing when we increase the population size.
For this reason, I suspect that this is due to an error in the way I provided the values of the parameters that control the horseshoe prior but I haven’t been able to fix this.
In particular, I have doubts about the \sigma^2 parameter, which represents the pseudo-variance of the distribution. In the paper, it is recommended to use the inverse of the mean number of events, when assuming a Poisson distribution. Therefore, I took \sigma^{-1} = \exp(\mu)*P. Is it a correct choice or does the high Rhat values come from this uncorrect specification?
You can find the R code here (6.6 KB) and the Stan model below.
.
Many thanks in advance,
Anthony

Stan model

functions {
}

// load data objects
data {
  //dimensions
  int N_x; //number of datapoint
  array[N_x] int x;
  array[N_x] int y;
  int pop;
  
  //Hyperprior parameters
  real p_mu0_mean;
  real p_mu0_sd;
  
  //horseshoe prior
  real < lower =0 > scale_global ; // scale for the half -t prior for tau
  real < lower =1 > nu_global ; // degrees of freedom for the half -t priors for tau
  real < lower =1 > nu_local ; // degrees of freedom for the half - t priors for lambdas
  real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
  real < lower =0 > slab_df ; // slab degrees of freedom for the regularized horseshoe
  
  int inference;
}

parameters {
  real mu0; //intercept mortality
  
  vector [N_x] f3_z;
  real < lower =0 > tau_hs; // global shrinkage parameter
  vector < lower =0 >[N_x] lambda_hs; // local shrinkage parameter
  real < lower =0 > caux_hs ;
}

transformed parameters {
  vector[N_x] f3;
  vector < lower =0 >[N_x] lambda_tilde_hs ; // ’ truncated ’ local shrinkage parameter
  real < lower =0 > c_hs; // slab scale
  real <lower=0> sigma_hs;
  array[N_x] real mu;
    
  c_hs = slab_scale * sqrt(caux_hs);
  lambda_tilde_hs = sqrt(c_hs^2 * square(lambda_hs) ./ (c_hs^2 + tau_hs^2* square(lambda_hs))); //regularized lambda
  f3 = f3_z .* lambda_tilde_hs * tau_hs; //normal(0,lambda_tilde^2*tau^2)

  //log mortality rate
  for(j in 1:N_x){
      mu[j] = mu0 + f3[j] + log(pop);
  }
  sigma_hs = 1/sqrt(exp(mu0)*pop);
}

model {
  //intercept
  mu0 ~ normal(p_mu0_mean,p_mu0_sd);
  
  f3_z ~ normal(0, 1);
  lambda_hs ~ student_t(nu_local, 0, 1); //
  tau_hs ~ student_t(nu_global, 0, scale_global * sigma_hs); //df, mean, sd, df=1 is Cauchy
  caux_hs ~ inv_gamma(0.5*slab_df, 0.5*slab_df);
  
  // likelihood             
  if(inference==1){
    for(i in 1:N_x){
      target += poisson_log_lpmf(y[i] |  mu[i]);
    }
  }
}

generated quantities {
}

R code

avehtari · February 16, 2023, 3:32pm

The code in the appendix C is using non-centered parameterization which is good when the likelihood is weak. With increasing P, the likelihood gets more informative (less noisy) and then the geometry of the non-centered posterior gets worse. You can switch to centered parameterization. See 25.7 Reparameterization | Stan User’s Guide

anthony_462 · February 20, 2023, 11:17am

Thanks @avehtari !
I tried to replace the student t distribution of the local shrinkage parameters \lambda by a gamma and a normal distribution as suggested by the link you mentioned, but unfortunately, this didn’t solve the issue of high R-hat, as the simulation with P=10^8 still produces Rhat around 1.5 for some of the \lambda parameters.
Note that I did not reparametrize the global shrinkage parameter \tau, as it always has low Rhat.

avehtari · February 20, 2023, 5:22pm

Did you try to switch from the non-centered parameterization to centered parameterizaton? The linked chapter discusses the case from centered to non-centered, but you hopefully can see from that how to do it in other direction, too

betanalpha · March 7, 2023, 8:58pm

The horseshoe population model is almost impossible to fit accurately due to its inherit geometry. For more discussion, including extensive experiments and some potential alternatives, see https://betanalpha.github.io/assets/case_studies/modeling_sparsity.html#2223_The_Horseshoe_Population_Model.

Topic		Replies	Views
Horseshoe prior with diabetes dataset Modeling fitting-issues	4	3279	May 12, 2017
Time Series with Horseshoe Prior Modeling specification	2	509	February 14, 2023
Can someone double check to make sure my Stan code is correct for an ordered logistic regression with a horseshoe prior? Modeling rstan , specification	2	381	December 1, 2023
Horseshoe prior in rstan RStan	9	5026	April 28, 2020
Question about Rhat statistics General rstan	5	639	June 25, 2020

Stan model with horseshoe prior produces high Rhat

Related topics