R Studio Abort with Finnish Horseshoe specification

Hi all,
I’m fitting the Finnish horseshoe using a specification I found in a webpage by Michael Betancourt.

Bayes Sparse Regression.

This specification runs without the control=list(adapt_delta=0.99, max_treedepth=15) but induces about 400 divergences. When add the control=list(adapt_delta=0.99, max_treedepth=15), R Studio aborts. I’m running R Studio Version 2021.09.0 Build 351, on a Mac with Big Sur. Here is the code. As always, thank you.

PISA18sampleScale <- read.csv(file.choose(),header=T)

n <- nrow(PISA18sampleScale)
X <- PISA18sampleScale[,2:11]
readscore <- PISA18sampleScale[,1]
p <- ncol(X)


data.list <- list(n=n, p=p, X=X, readscore=readscore)

# Stan code adapated from Vehtari, Appendix C. #
modelString = "
data {
  int <lower=1> n;          // number of observations
  int <lower=1> p;          // number of predictors
  real readscore[n];        // outcome
  matrix[n,p] X;          //inputs

}

// slab_scale = 5, slab_df = 25


transformed data {
  real p0 = 5;
  real slab_scale = 3;
  real slab_scale2 = square(slab_scale);
  real slab_df = 25;
  real half_slab_df = 0.5 * slab_df;
  
}

parameters {
  vector[p] beta_tilde;
  vector<lower=0>[p] lambda;
  real<lower=0> c2_tilde;
  real<lower=0> tau_tilde;
  real alpha;
  real<lower=0> sigma;

}


transformed parameters {
  vector [p] beta ; // regression coefficients
{
 real tau0 = (p0 / (p - p0)) * (sigma / sqrt(1.0 * n));
    real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)
    
    // c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
    // Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
    
    real c2 = slab_scale2 * c2_tilde;
    
    vector[p] lambda_tilde =
      sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );
    
    // beta ~ normal(0, tau * lambda_tilde)
    beta = tau * lambda_tilde .* beta_tilde;
    
    }
}
 
model {
  beta_tilde ~ normal(0, 1);
  lambda ~ cauchy(0, 1);
  tau_tilde ~cauchy(0,1);
  c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
  
  alpha ~ normal(0, 2);
  sigma ~ normal(0, 2);
  
  readscore ~ normal(X * beta + alpha, sigma);

}

"

# Start estimation

nChains = 4
nIter= 50000
thinSteps = 10
warmupSteps = floor(nIter/2)

readscore = data.list$readscore

myfitFinnHorseshoe = stan(data=data.list,model_code=modelString,chains=nChains,
                          iter=nIter,warmup=warmupSteps,thin=thinSteps,
                          control=list(adapt_delta=0.99, max_treedepth=15))

This is something that we’ve been seeing lately with the CRAN rstan, you can either try cmdstanr: R Interface to CmdStan • cmdstanr

Or the preview of the next rstan version:

remove.packages(c("StanHeaders", "rstan"))
install.packages("StanHeaders", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

The next version of rstan works very well.

On a somewhat related topic, perhaps relevant for @avehtari, is there a resource providing guidance on how to choose the slab scale and slab df? I am getting beautiful trace plots, perfect rhats, and very good n_eff. However the density plots really don’t look very good.

Rplot.pdf (55.2 KB)

Thanks.

This is the best resource Sparsity information and regularization in the horseshoe and other shrinkage priors. The slab is more important, e.g., in logistic regression where with many covariates complete separation is more likely. You can set the slab parameters as how you would set your prior for coefficients if oracle would tell you which covariates are irrelevant.

Looking at the plots, I see that you don’t have many covariates, and I guess that you have many more observations than covariates. In such case there is a smaller benefit from horseshoe. Also, I guess that you have correlating covariates. You could look at the pairs scatter plots for that. You could continue with projection predictive variable selection (see examples at Model selection tutorials and talks). If your model is not much more complex than what you now have, you could use brms or rstanarm to make your model and use CRAN - Package projpred to do the rest of the work. See projpred examples at Model selection tutorials and talks.

I should have mentioned that this analysis was based on a small sample from PISA. I have n=100, and p=10.

Playing with the slab seemed to help a bit, but there are still some density plots that don’t quite look as good as one would hope. However, the results seem roughly consistent with theory insofar as the small coefficients from the traditional horseshoe are even a bit smaller with the Finnish horseshoe. The larger ones (and there weren’t many) shrunk a bit too. I’m not sure how to get the density plots to look better, apart from your suggestions, but I want to stick to demonstrating the Finnish horseshoe. Again, all the other diagnostics are spot on.

What would you hope to see?

The demonstration for the posterior marginals would be more clear with a larger p.