Correctly setting mc.cores for rstanarm models

I’m running a stan_glm() model via rstanarm on a multi-core server with +150 cores and +1 TB of RAM. It’s not clear to me that the number of cores I set is correct.

Let’s say I instantiate and run a Poisson GLM model with 10 chains and 90 cores:

library(rstanarm)
options(mc.cores = 80)
model = stan_glm(formula = A ~ B, data=my_data, family=poisson(), cores=80, chains=10)

When I run this though, it appears to approximately 10 cores.

How can I re-run this to use all 80 cores?

Operating System: Unix
Interface Version: rstanarm
Compiler/Toolkit:

Each chain can only be run on 1 core, so you are limited at 10 cores if you only run 10 chains.

(Of course, if you want to use 80 cores just for the sake of using 80 cores, you can run 80 chains but that’s probably over-kill for a simple Poisson GLM).

Each chain can only be run on 1 core, so you are limited at 10 cores if you only run 10 chains.

I see. For this simple Poisson GLM model, I am inputting a rather large dataset with many variables.

SAMPLING FOR MODEL 'model1' NOW (CHAIN 1).                                                              

Gradient evaluation took 0.98 seconds                                                                                 1000 transitions using 10 leapfrog steps per transition would take 9800 seconds.             Adjust your expectations accordingly!                                                                                                                              

After 7 days of running, it is now at

Iteration:    600 / 2000 [  30%]  (Warmup) 

(1) I suspect there is no way to speed this up

(2) I’m not sure how to interpret the message “1000 transitions using 10 leapfrog steps per transition would take 9800 seconds.”. Based on this, how do users approximate how many hours/days/weeks the chains will take to run?

For (2), see Time to run

For (1), nothing immediately comes to mind. You might gain a bit by using tighter priors, but if you’re seeing slowness from large data (large n) I don’t think there’s too much to gain there. If your problem is ‘high-dimensional’ ($p \gg n$) you might see gains.

If you have more than two variables, specifying QR = TRUE is usually going to be faster. But if you have a very large number of variables, then you often need a non-default prior that is more heavily concentrated at zero, such as hs(), hs_plus(), product_normal(), etc.

then you often need a non-default prior that is more heavily concentrated at zero, such as hs(), hs_plus(), product_normal()

If the model has a large number of parameters, I don’t follow why users would want to chose a priors which are concentrated at zero (and therefore, in the case of hs() or hs_plus() with posteriors concentrated near zero). Could you explain?

I guess perhaps I’m being abstract when I’m referring to a “large dataset”. The corresponding glm(family=Poisson()) model in R takes seconds. On the other hand, there ~40 categorical variables and the data input is on the +50 GB scale.

The priors are detailed here:

In situations like these, usually the user believes that most of the predictors have no effect but does not know which ones. So, the priors heavily favor zero for all coefficients. Even if your beliefs are not that, you will get better predictions if your priors regularize heavily. But the supported priors are all marginally independent, so doing QR = TRUE makes that more plausible and tends to increase the sampling efficiency.

Thank you, this makes sense now.

Solely in terms of performance, QR = TRUE appears to make all of the difference.

1 Like

Talking about estimating a lot of parameters (hoping I am on topic here?):

Currently running a measurement error model on X^*, the imperfectly measured surrogate for perfectly measured X. This means that the number of measurement error parameters estimated is equal to N, the number of data points. We have N = 300,000, plus a non-nested multilevel model. That is a lot of parameters.

Crazy thought: to help speed up computation and try and remove some (fatal) divergences we get when running the model on the the full N, should we put a “non-default prior [on the mean of true X] that is more heavily concentrated at zero, such as hs(), hs_plus(), product_normal(), etc”; but centre it away from zero as we expect X to be positive, with X^* = X + \epsilon?