Hi. i am using the horseshoe prior, coded by stan as following:
data {
int<lower=1> n; // Number of data
int<lower=1> p; // Number of covariates
matrix[n,p] X; // n-by-p design matrix
real y[n]; // n-dimensional response vector
}
parameters {
vector[p] beta;
vector<lower=0>[p] lambda;
real<lower=0> tau;
real<lower=0> sigma;
}
transformed parameters {
vector[n] theta ;
theta = X * beta;
}
model {
lambda ~ cauchy(0, 1);
tau ~ cauchy(0, 1);
beta ~ normal(0, sigma * tau * lambda);
y ~ normal(theta, sigma);
}
I don’t know why, but it is extremely slow.
Would you give me any recommendations to speed up??
EDIT: @maxbiostat edited this post for syntax highlighting.
What do you mean by extremely slow? Days?
How large are n and p?
To speed up
Or you could use brms
or rstanarm
packages which have these speed ups already implemented.
5 Likes
Thank you very much for your reply.
20,000 iterations takes nearly 30 minutes or more, which may just take 1 minute in ordinary MCMC sampler. n and p are quite small, n = 100, p =500.
Why are you running 20K iterations, though? What is the lowest ESS you get?
I am running 20K because I usually use this much in sampling to see a stationary.
I’ll let @avehtari be the final judge of that, but I find it weird it would take Stan 20K iterations to warmup properly for a problem of this size.
1 Like
To speed up I used the parameter expansion for the half-Cauchy as follows:
data {
int<lower=1> n; // Number of data
int<lower=1> p; // Number of covariates
matrix[n,p] X; // n-by-p design matrix
real y[n]; // n-dimensional response vector
}
parameters {
vector[p] beta;
vector<lower=0>[p] lambda_sq;
vector<lower=0>[p] omega;
real<lower=0> tau_sq;
real<lower=0> eta;
real<lower=0> sigma;
}
transformed parameters {
vector[n] theta ;
theta = X * beta;
}
model {
// Origianal horseshoe by carvalho + Parameter expansion on half-Cauchy
//lambda ~ cauchy(0, 1) is equivalent with;
omega ~ inv_gamma(1 / 2, 1);
lambda_sq ~ inv_gamma(1 / 2, 1 ./ omega);
//tau ~ cauchy(0, 1) is equivalent with;;
eta ~ inv_gamma(1 / 2, 1);
tau_sq ~ inv_gamma(1 / 2, 1 / eta);
beta ~ normal(0, sigma * sqrt(tau_sq) * sqrt(lambda_sq) );
y ~ normal(theta, sigma);
}
but I don’t why this does not work…
How do you diagnose stationarity? How big is your data, that is, n and p?
How about the speed ups I recommended?
I check stationarity by see the traceplot in my naked eyes.
For simulation study, I typically use n = 100 p =500, but in gene expression data study, obviously p is much larger than 500, saying 5000.
That is not sufficient. If you are using RStan you can use monitor function.
Se_Yoon_Lee:
For simulation study, I typically use n = 100 p =500, but in gene expression data study, obviously p is much larger than 500, saying 5000.
See here for an example with p>5000
Program and data are not secret. I was just in a hurry when writing that post.
Data is Prostate_GE listed at http://featureselection.asu.edu/datasets.php
The model is described in Piironen and Vehtari (2017) https://projecteuclid.org/euclid.ejs/1513306866 . The code is slightly modified from appendix C.1. There are less transformed parameters saved and bernoulli_logit_glm() is used instead of bernoulli_logit() (this gives about 4x speedup).
bernoulli_logit_glm_rhs.stan code is
data {
int<lo…
and the comments about the speed What's the highest dimensional model Stan can fit using NUTS? - #5 by avehtari
With my 3+ year old laptop it’s 2 hours, with GPUs it would be less than 40mins
1 Like