How many warmup, iter, and thin are recommended to use in high-dimensional problem?

Se_Yoon_Lee · April 27, 2020, 7:44pm

I know it is a general question.
When I typically run via my own developed sampling scheme in high-dimensional problem, saying high-dimensional linear regression with n=100, p=500, I use
burn = 10000, nmc = 10000 (no of samples after burn), and thin = 10, for publication purpose.

However, when I do stan(.), I found that, stan is “extremely” slow.
In my experience, it is even slower than using simple Metropolis-hastings algorithm.
I know that Hamiltonian MC may take long when it deals with heavy-tailed prior, which frequently arise in Bayesian sparse prior, but too slow to use…

So I am curious about:
how do you use usually for the (warmup, iter, thin) ??

abartonicek · April 27, 2020, 9:09pm

It probably depends on the context. However, afaik, 20,000 samples per chain might be a bit of an overkill. If you need thinning to avoid divergences/autocorrelations, then that suggests there’s something iffy going on with the posterior. Maybe try tightening up the priors & sampling the default 2,000 iters per chain, with no thinning? Tightening up priors may be important given there’s not that much data.

Also, importantly, what is your hierarchical prior on the sd of the regression slopes? I’ve been able to run horseshoe & normal (i.e. “Bayesian ridge”) just fine on a data with ~10,000 obs and 500 predictors, with the default settings for the number of iterations & sampler controls (i.e. adapt_delta, max_treedepth).

Se_Yoon_Lee · April 27, 2020, 9:16pm

Thank you very much for your suggestion.
I am interested in opposite of your situation.
In my case, number of observation may be confined to be 500, at maximum, but number of predictors shall be at least 3000, because my typical application area is gene-expression data study.

abartonicek · April 27, 2020, 9:24pm

Yeah, p >> n oughta make things more challenging. However, afaik, hierarchical normal prior and horseshoe should work well in this scenario (especially horseshoe). I’ve heard some mixed things about double Laplace prior, that’s sometimes compared to LASSO (which afaik is commonly used in this line of work), so maybe if you’re using double Laplace & are getting sampling problems, try horseshoe? If you’re interested in a truly sparse option & want to recover few genes of interest, you could also try Aki Vehtari’s projpred package (https://mc-stan.org/projpred/) - from my experience with it, it works super well.

Se_Yoon_Lee · April 27, 2020, 9:32pm

Horseshoe prior already has a widely used package “horseshoe” which is very fast: it uses slice sampler.

I am interested in finding some drawback of stan(.) in high-dimensional problem, and its remedy, by comparing with horseshoe prior and its variant versions of the horseshoe.

Prior of Aki Vehtari is called the “regularized horseshoe”, and this is also slow if using stan(.): I tried.

Here, what I mean slow is that it takes 10 minutes if using default setting for the stan, which takes only 2 minutes or less if using package “horseshoe”.

abartonicek · April 27, 2020, 9:51pm

Idk, 10 minutes for a p >> n problem doesn’t seem very slow to me (I mean it’s always going to be slower than machine learning methods, but you don’t get a full posterior with those). I have no experience with the horseshoe package or slice sampling. Looking at the vignette, it seems fine, one thing that seems a bit weird is that it uses Jeffrey’s priors for sigma by default, afaik that’s often discouraged, although I don’t know how it might apply to horseshoe specifically. It also doesn’t seem to implement functions for model checking - posterior predictive checks, cross-validation, etc… - so that might be one advantage with Stan/brms.

Topic		Replies	Views
Horseshoe prior in rstan RStan	9	5209	April 28, 2020
Chain length Modeling	2	1074	January 5, 2018
MCMC sampling with high-dimensional data Modeling	3	1880	May 22, 2018
Horseshoe prior with diabetes dataset Modeling fitting-issues	4	3329	May 12, 2017
Linear Model with high dimensional parameter space Modeling	23	1111	February 3, 2020

How many warmup, iter, and thin are recommended to use in high-dimensional problem?

Related topics