How many warmup, iter, and thin are recommended to use in high-dimensional problem?

I know it is a general question.
When I typically run via my own developed sampling scheme in high-dimensional problem, saying high-dimensional linear regression with n=100, p=500, I use
burn = 10000, nmc = 10000 (no of samples after burn), and thin = 10, for publication purpose.

However, when I do stan(.), I found that, stan is “extremely” slow.
In my experience, it is even slower than using simple Metropolis-hastings algorithm.
I know that Hamiltonian MC may take long when it deals with heavy-tailed prior, which frequently arise in Bayesian sparse prior, but too slow to use…

So I am curious about:
how do you use usually for the (warmup, iter, thin) ??

It probably depends on the context. However, afaik, 20,000 samples per chain might be a bit of an overkill. If you need thinning to avoid divergences/autocorrelations, then that suggests there’s something iffy going on with the posterior. Maybe try tightening up the priors & sampling the default 2,000 iters per chain, with no thinning? Tightening up priors may be important given there’s not that much data.

Also, importantly, what is your hierarchical prior on the sd of the regression slopes? I’ve been able to run horseshoe & normal (i.e. “Bayesian ridge”) just fine on a data with ~10,000 obs and 500 predictors, with the default settings for the number of iterations & sampler controls (i.e. adapt_delta, max_treedepth).

Thank you very much for your suggestion.
I am interested in opposite of your situation.
In my case, number of observation may be confined to be 500, at maximum, but number of predictors shall be at least 3000, because my typical application area is gene-expression data study.

Yeah, p >> n oughta make things more challenging. However, afaik, hierarchical normal prior and horseshoe should work well in this scenario (especially horseshoe). I’ve heard some mixed things about double Laplace prior, that’s sometimes compared to LASSO (which afaik is commonly used in this line of work), so maybe if you’re using double Laplace & are getting sampling problems, try horseshoe? If you’re interested in a truly sparse option & want to recover few genes of interest, you could also try Aki Vehtari’s projpred package (https://mc-stan.org/projpred/) - from my experience with it, it works super well.

1 Like

Horseshoe prior already has a widely used package “horseshoe” which is very fast: it uses slice sampler.

I am interested in finding some drawback of stan(.) in high-dimensional problem, and its remedy, by comparing with horseshoe prior and its variant versions of the horseshoe.

Prior of Aki Vehtari is called the “regularized horseshoe”, and this is also slow if using stan(.): I tried.

Here, what I mean slow is that it takes 10 minutes if using default setting for the stan, which takes only 2 minutes or less if using package “horseshoe”.

Idk, 10 minutes for a p >> n problem doesn’t seem very slow to me (I mean it’s always going to be slower than machine learning methods, but you don’t get a full posterior with those). I have no experience with the horseshoe package or slice sampling. Looking at the vignette, it seems fine, one thing that seems a bit weird is that it uses Jeffrey’s priors for sigma by default, afaik that’s often discouraged, although I don’t know how it might apply to horseshoe specifically. It also doesn’t seem to implement functions for model checking - posterior predictive checks, cross-validation, etc… - so that might be one advantage with Stan/brms.

2 Likes