Hi,
I am working with an ODE model in STAN. I have used the Map-Rect() to let each person calculate their own likelihood in parallel, to have a faster model.
When I tested 4 short chains before running longer chains (make sure no chain would be blocked somewhere, leading to super slow model fitting), I noticed that often the 4 chains don’t progress in parallel (I have cores = min(nChains, parallel::detectCores())).
And more strange, sometimes the model does nothing (don’t even consume CPUs ) for quite a while before starting sampling (and the sampling can be quick…)!! Below is an example I got yesterday night (100 people+4 chains+40 iterations for testing). I was surprised to see that these chains were running somehow sequentially rather than in parallel. I was not sure is it due to the memory issue (I had stack memory issue occasionally)? or due to the model specification(e.g. too wide parameter space)? or due to Map-Rect()? How I can make the chains run in parallel to be faster?
#####################################
Click the Refresh button to see progress of the chains
starting worker pid=22288 on localhost:11798 at 20:33:14.974
starting worker pid=20188 on localhost:11798 at 20:33:15.315
starting worker pid=22884 on localhost:11798 at 20:33:15.629
starting worker pid=25336 on localhost:11798 at 20:33:15.930
SAMPLING FOR MODEL ‘VK_Drug_Parallel’ NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.031 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 310 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: WARNING: There aren’t enough warmup iterations to fit the
Chain 1: three stages of adaptation as currently configured.
Chain 1: Reducing each adaptation stage to 15%/75%/10% of
Chain 1: the given number of warmup iterations:
Chain 1: init_buffer = 3
Chain 1: adapt_window = 15
Chain 1: term_buffer = 2
Chain 1:
Chain 1: Iteration: 1 / 40 [ 2%] (Warmup)
SAMPLING FOR MODEL ‘VK_Drug_Parallel’ NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0.046 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 460 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: WARNING: There aren’t enough warmup iterations to fit the
Chain 2: three stages of adaptation as currently configured.
Chain 2: Reducing each adaptation stage to 15%/75%/10% of
Chain 2: the given number of warmup iterations:
Chain 2: init_buffer = 3
Chain 2: adapt_window = 15
Chain 2: term_buffer = 2
Chain 2:
SAMPLING FOR MODEL ‘VK_Drug_Parallel’ NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0.031 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 310 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: WARNING: There aren’t enough warmup iterations to fit the
Chain 3: three stages of adaptation as currently configured.
Chain 3: Reducing each adaptation stage to 15%/75%/10% of
Chain 3: the given number of warmup iterations:
Chain 3: init_buffer = 3
Chain 3: adapt_window = 15
Chain 3: term_buffer = 2
Chain 3:
Chain 3: Iteration: 1 / 40 [ 2%] (Warmup)
Chain 3: Iteration: 5 / 40 [ 12%] (Warmup)
SAMPLING FOR MODEL ‘VK_Drug_Parallel’ NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0.062 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 620 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: WARNING: There aren’t enough warmup iterations to fit the
Chain 4: three stages of adaptation as currently configured.
Chain 4: Reducing each adaptation stage to 15%/75%/10% of
Chain 4: the given number of warmup iterations:
Chain 4: init_buffer = 3
Chain 4: adapt_window = 15
Chain 4: term_buffer = 2
Chain 4:
Chain 4: Iteration: 1 / 40 [ 2%] (Warmup)
Chain 3: Iteration: 10 / 40 [ 25%] (Warmup)
Chain 3: Iteration: 15 / 40 [ 37%] (Warmup)
Chain 3: Iteration: 20 / 40 [ 50%] (Warmup)
Chain 3: Iteration: 21 / 40 [ 52%] (Sampling)
Chain 3: Iteration: 25 / 40 [ 62%] (Sampling)
Chain 3: Iteration: 30 / 40 [ 75%] (Sampling)
Chain 3: Iteration: 35 / 40 [ 87%] (Sampling)
Chain 3: Iteration: 40 / 40 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 814.961 seconds (Warm-up)
Chain 3: 482.029 seconds (Sampling)
Chain 3: 1296.99 seconds (Total)
Chain 3:
Chain 1: Iteration: 5 / 40 [ 12%] (Warmup)
Chain 1: Iteration: 10 / 40 [ 25%] (Warmup)
Chain 1: Iteration: 15 / 40 [ 37%] (Warmup)
Chain 1: Iteration: 20 / 40 [ 50%] (Warmup)
Chain 1: Iteration: 21 / 40 [ 52%] (Sampling)
Chain 1: Iteration: 25 / 40 [ 62%] (Sampling)
Chain 1: Iteration: 30 / 40 [ 75%] (Sampling)
Chain 1: Iteration: 35 / 40 [ 87%] (Sampling)
Chain 1: Iteration: 40 / 40 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 7081.22 seconds (Warm-up)
Chain 1: 277.698 seconds (Sampling)
Chain 1: 7358.92 seconds (Total)
Chain 1:
Chain 4: Iteration: 5 / 40 [ 12%] (Warmup)
Chain 4: Iteration: 10 / 40 [ 25%] (Warmup)
Chain 4: Iteration: 15 / 40 [ 37%] (Warmup)
Chain 4: Iteration: 20 / 40 [ 50%] (Warmup)
Chain 4: Iteration: 21 / 40 [ 52%] (Sampling)
Chain 4: Iteration: 25 / 40 [ 62%] (Sampling)
Chain 4: Iteration: 30 / 40 [ 75%] (Sampling)
Chain 2: Iteration: 1 / 40 [ 2%] (Warmup)
Chain 2: Iteration: 5 / 40 [ 12%] (Warmup)
Chain 4: Iteration: 35 / 40 [ 87%] (Sampling)
Chain 4: Iteration: 40 / 40 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 9534.48 seconds (Warm-up)
Chain 4: 256.129 seconds (Sampling)
Chain 4: 9790.61 seconds (Total)
Chain 4:
Chain 2: Iteration: 10 / 40 [ 25%] (Warmup)
Chain 2: Iteration: 15 / 40 [ 37%] (Warmup)
Chain 2: Iteration: 20 / 40 [ 50%] (Warmup)
Chain 2: Iteration: 21 / 40 [ 52%] (Sampling)
Chain 2: Iteration: 25 / 40 [ 62%] (Sampling)
Chain 2: Iteration: 30 / 40 [ 75%] (Sampling)
Chain 2: Iteration: 35 / 40 [ 87%] (Sampling)
Chain 2: Iteration: 40 / 40 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 365.391 seconds (Warm-up)
Chain 2: 206.01 seconds (Sampling)
Chain 2: 571.401 seconds (Total)
Chain 2: