Horseshoe prior in rstan

Hi. i am using the horseshoe prior, coded by stan as following:

data {
  int<lower=1> n; // Number of data
  int<lower=1> p; // Number of covariates
  matrix[n,p] X;  // n-by-p design matrix
  real y[n];      // n-dimensional response vector
}

parameters {
  vector[p] beta;
  vector<lower=0>[p] lambda;
  real<lower=0> tau;
  real<lower=0> sigma;
}

transformed parameters {
  vector[n] theta ;
  theta = X * beta;
}

model {
  lambda ~ cauchy(0, 1);
  tau ~ cauchy(0, 1);
  beta ~ normal(0, sigma * tau * lambda); 
  y ~ normal(theta, sigma);
}

I don’t know why, but it is extremely slow.
Would you give me any recommendations to speed up??

EDIT: @maxbiostat edited this post for syntax highlighting.

  • What do you mean by extremely slow? Days?
  • How large are n and p?

To speed up

Or you could use brms or rstanarm packages which have these speed ups already implemented.

4 Likes

Thank you very much for your reply.

20,000 iterations takes nearly 30 minutes or more, which may just take 1 minute in ordinary MCMC sampler. n and p are quite small, n = 100, p =500.

Why are you running 20K iterations, though? What is the lowest ESS you get?

I am running 20K because I usually use this much in sampling to see a stationary.

I’ll let @avehtari be the final judge of that, but I find it weird it would take Stan 20K iterations to warmup properly for a problem of this size.

To speed up I used the parameter expansion for the half-Cauchy as follows:
data {
int<lower=1> n; // Number of data
int<lower=1> p; // Number of covariates
matrix[n,p] X; // n-by-p design matrix
real y[n]; // n-dimensional response vector
}

parameters {
vector[p] beta;
vector<lower=0>[p] lambda_sq;
vector<lower=0>[p] omega;
real<lower=0> tau_sq;
real<lower=0> eta;
real<lower=0> sigma;
}

transformed parameters {
vector[n] theta ;
theta = X * beta;
}

model {
// Origianal horseshoe by carvalho + Parameter expansion on half-Cauchy
//lambda ~ cauchy(0, 1) is equivalent with;
omega ~ inv_gamma(1 / 2, 1);
lambda_sq ~ inv_gamma(1 / 2, 1 ./ omega);
//tau ~ cauchy(0, 1) is equivalent with;;
eta ~ inv_gamma(1 / 2, 1);
tau_sq ~ inv_gamma(1 / 2, 1 / eta);
beta ~ normal(0, sigma * sqrt(tau_sq) * sqrt(lambda_sq) );
y ~ normal(theta, sigma);
}

but I don’t why this does not work…

How do you diagnose stationarity? How big is your data, that is, n and p?

How about the speed ups I recommended?

I check stationarity by see the traceplot in my naked eyes.
For simulation study, I typically use n = 100 p =500, but in gene expression data study, obviously p is much larger than 500, saying 5000.

That is not sufficient. If you are using RStan you can use monitor function.

See here for an example with p>5000


and the comments about the speed What's the highest dimensional model Stan can fit using NUTS?
With my 3+ year old laptop it’s 2 hours, with GPUs it would be less than 40mins