Correctly setting mc.cores for rstanarm models

ShanZhengYang · June 9, 2017, 9:42pm

I’m running a stan_glm() model via rstanarm on a multi-core server with +150 cores and +1 TB of RAM. It’s not clear to me that the number of cores I set is correct.

Let’s say I instantiate and run a Poisson GLM model with 10 chains and 90 cores:

library(rstanarm)
options(mc.cores = 80)
model = stan_glm(formula = A ~ B, data=my_data, family=poisson(), cores=80, chains=10)

When I run this though, it appears to approximately 10 cores.

How can I re-run this to use all 80 cores?

Operating System: Unix
Interface Version: rstanarm
Compiler/Toolkit:

weylandt · June 9, 2017, 9:49pm

Each chain can only be run on 1 core, so you are limited at 10 cores if you only run 10 chains.

(Of course, if you want to use 80 cores just for the sake of using 80 cores, you can run 80 chains but that’s probably over-kill for a simple Poisson GLM).

ShanZhengYang · June 9, 2017, 11:59pm

Each chain can only be run on 1 core, so you are limited at 10 cores if you only run 10 chains.

I see. For this simple Poisson GLM model, I am inputting a rather large dataset with many variables.

SAMPLING FOR MODEL 'model1' NOW (CHAIN 1).                                                              

Gradient evaluation took 0.98 seconds                                                                                 1000 transitions using 10 leapfrog steps per transition would take 9800 seconds.             Adjust your expectations accordingly!

After 7 days of running, it is now at

Iteration:    600 / 2000 [  30%]  (Warmup)

(1) I suspect there is no way to speed this up

(2) I’m not sure how to interpret the message “1000 transitions using 10 leapfrog steps per transition would take 9800 seconds.”. Based on this, how do users approximate how many hours/days/weeks the chains will take to run?

weylandt · June 10, 2017, 5:44pm

For (2), see Time to run

For (1), nothing immediately comes to mind. You might gain a bit by using tighter priors, but if you’re seeing slowness from large data (large n) I don’t think there’s too much to gain there. If your problem is ‘high-dimensional’ ($p \gg n$) you might see gains.

bgoodri · June 10, 2017, 6:11pm

If you have more than two variables, specifying QR = TRUE is usually going to be faster. But if you have a very large number of variables, then you often need a non-default prior that is more heavily concentrated at zero, such as hs(), hs_plus(), product_normal(), etc.

ShanZhengYang · June 10, 2017, 6:24pm

then you often need a non-default prior that is more heavily concentrated at zero, such as hs(), hs_plus(), product_normal()

If the model has a large number of parameters, I don’t follow why users would want to chose a priors which are concentrated at zero (and therefore, in the case of hs() or hs_plus() with posteriors concentrated near zero). Could you explain?

I guess perhaps I’m being abstract when I’m referring to a “large dataset”. The corresponding glm(family=Poisson()) model in R takes seconds. On the other hand, there ~40 categorical variables and the data input is on the +50 GB scale.

The priors are detailed here:

bgoodri · June 10, 2017, 10:35pm

In situations like these, usually the user believes that most of the predictors have no effect but does not know which ones. So, the priors heavily favor zero for all coefficients. Even if your beliefs are not that, you will get better predictions if your priors regularize heavily. But the supported priors are all marginally independent, so doing QR = TRUE makes that more plausible and tends to increase the sampling efficiency.

ShanZhengYang · June 11, 2017, 2:58am

Thank you, this makes sense now.

Solely in terms of performance, QR = TRUE appears to make all of the difference.

Ilan_Strauss · June 20, 2019, 5:04pm

Talking about estimating a lot of parameters (hoping I am on topic here?):

Currently running a measurement error model on X^*, the imperfectly measured surrogate for perfectly measured X. This means that the number of measurement error parameters estimated is equal to N, the number of data points. We have N = 300,000, plus a non-nested multilevel model. That is a lot of parameters.

Crazy thought: to help speed up computation and try and remove some (fatal) divergences we get when running the model on the the full N, should we put a “non-default prior [on the mean of true X] that is more heavily concentrated at zero, such as hs(), hs_plus(), product_normal(), etc”; but centre it away from zero as we expect X to be positive, with X^* = X + \epsilon?

Topic		Replies	Views
Problem parallelizing rstanarm::loo() General loo	13	5197	October 20, 2023
Stan_glm Modeling	7	1660	December 26, 2018
Parallel chains with rstan fails in Rstudio RStan	10	2610	July 5, 2020
Advice for parallelizing many Stan models with multiple chains Modeling	1	596	September 20, 2022
Significant Differences between `rstan` and `rstanarm` running times Modeling rstanarm	1	1515	August 8, 2020

Correctly setting mc.cores for rstanarm models

Related topics