Within-chain parallelization misbehaves with brms in a model with measurement errors

Gang · October 26, 2020, 3:11am

The following model specification works as expected with brms through within-chain parallelization:

brm(y ~ cond + (1|Subj), data=dat, …, backend = “cmdstanr”, threads = threading(4))

It would also work fine when the measurement error for the response variable y is incorporated into the model without using within-chain parallelization:

brm(y | se(SE, sigma = TRUE) ~ cond + (1|Subj), data=dat, …,)

However, when within-chain parallelization is invoked for the second model:

brm(y | se(SE, sigma=TRUE) ~ cond + (1|Subj), data=dat, …, backend = “cmdstanr”, threads = threading(4))

it sputters with the following:

Chain 1 Exception: Exception: normal_lpdf: Random variable has size = 710, but Scale parameter has size 11366; and they must be the same size. (in ‘/tmp/Rtmp8GdeTj/model-336f182bf4d2.stan’, line 28, column 4 to column 73) (in ‘/tmp/Rtmp8GdeTj/model-336f182bf4d2.stan’, line 28, column 4 to column 73)
Chain 1 Exception: Exception: normal_lpdf: Random variable has size = 710, but Scale parameter has size 11366; and they must be the same size. (in ‘/tmp/Rtmp8GdeTj/model-336f182bf4d2.stan’, line 28, column 4 to column 73) (in ‘/tmp/Rtmp8GdeTj/model-336f182bf4d2.stan’, line 28, column 4 to column 73)
…
Warning: Chain 1 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 3 finished unexpectedly!

Warning: Chain 4 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.

Error in rstan::read_stan_csv(out$output_files()) :

csvfiles does not contain any CSV file name

Calls: brm … eval2 → eval → eval → .fun → .fit_model →

In addition: Warning messages:

1: All chains finished unexpectedly!
2: No chains finished successfully. Unable to retrieve the fit.
Execution halted

paul.buerkner · October 26, 2020, 6:41am

Looks like a bug in the stan code generation. I will check.

paul.buerkner · October 26, 2020, 7:24am

On adhoc test data, it works with the brms github version:

dat <- data.frame(
  y = rnorm(100),
  SE = rexp(100),
  cond = sample(c("a", "b"), 100, TRUE),
  Subj = sample(1:10, 100, TRUE)
)

brm(y | se(SE, sigma=TRUE) ~ cond + (1|Subj), data=dat, 
    backend = "cmdstanr", threads = threading(4))

Please provide a minimial reproducible example for the problem you see (after testing whether it works already on github).

Gang · October 26, 2020, 11:46am

I have updated brms to version 2.14.2 from github, and tested that it does work fine with your ad hoc test data:

dat ← data.frame(
y = rnorm(100),
SE = rexp(100),
cond = sample(c(“a”, “b”), 100, TRUE),
Subj = sample(1:10, 100, TRUE)
)

However, it sill whines with my dataset:

dat ← read.table(‘dat.txt’, header=T)
library(‘brms’)
library(‘cmdstanr’)
set_cmdstan_path(‘~/cmdstan’)
options(mc.cores = parallel::detectCores())
fm ← brm(y|se(SE, sigma = TRUE)~cond+(1|Subj), data=dat, chains = 4, iter=1000, backend = “cmdstanr”, threads = threading(2))

paul.buerkner · October 26, 2020, 1:31pm

Found the problem and fixed it on github.

Gang · October 26, 2020, 1:48pm

Thanks for the quick fix, Paul!

baldwaprateek · October 28, 2020, 11:58am

Hi Paul,

I am facing similar issue while specifying ‘threading’ argument. Let me share the model setup here however, I cannot share actual model and data due to data security issues.

The below setup works fine in absence of threading. Also, I have upgraded BRMS to latest version 2.14.2

Let me know the work around for this.

The error details are as below:

Warning: Chain 1 finished unexpectedly!

Error in rstan::read_stan_csv(out$output_files()) :
csvfiles does not contain any CSV file name
In addition: Warning message:
No chains finished successfully. Unable to retrieve the fit.

CODE :

my_prior <- c(prior(normal(0,100),class=‘b’,nlpar=‘Int’)+
prior(beta(6,112),class=‘b’,nlpar=‘A’,lb=0,ub=1)+
prior(beta(0.24,24),class=‘b’,nlpar=‘B’,lb=0,ub=1)+
prior(beta(2.15,70),class=‘b’,nlpar=‘C’,lb=0,ub=1)+
prior(beta(2.15,70),class=‘b’,nlpar=‘D’,lb=0,ub=1)+
prior(beta(23,202),class=‘b’,nlpar=‘E’,lb=0,ub=1)+
prior(beta(23,202),class=‘b’,nlpar=‘F’,lb=0,ub=1)+
prior(beta(32,232),class=‘b’,nlpar=‘G’,lb=0,ub=1)+
prior(beta(32,232),class=‘b’,nlpar=‘H’,lb=0,ub=1)+
prior(normal(0,100),class=‘b’,nlpar=‘I’)+
prior(normal(0,100),class=‘b’,nlpar=‘J’)+
prior(normal(0,100),class=‘b’,nlpar=‘K’)+
prior(normal(0,100),class=‘b’,nlpar=‘L’)+
prior(normal(0,100),class=‘b’,nlpar=‘M’)+
prior(normal(0,100),class=‘b’,nlpar=‘N’)+
prior(normal(0,100),class=‘b’,nlpar=‘O’))

library(future)

plan(multiprocess,workers=5)
start.time <- Sys.time()

model <- brm_multiple(bf(out ~ Int
+A1+B1+C1+D1+E1+F1+G1+H1
+I1+J1+K1+L1+M1+N1+O1
,nl=TRUE)+
lf(Int ~ 1)+
lf(A1 ~ 0+A)+
lf(B1 ~ 0+B)+
lf(C1 ~ 0+C)+
lf(D1 ~ 0+D)+
lf(E1 ~ 0+E)+
lf(F1 ~ 0+F)+
lf(G1 ~ 0+G)+
lf(H1 ~ 0+H)+
lf(I1 ~ 0+I)+
lf(J1 ~ 0+J)+
lf(K1 ~0+K)+
lf(L1 ~ 0+L)+
lf(M1 ~ 0+M)+
lf(N1 ~ 0+N)+
lf(O1 ~ 0+O),
data = df_split,
family = bernoulli(“logit”),
prior = my_prior,warmup = 1000,chains = 5,cores=5,
seed=12345,iter=2000,silent = FALSE,
backend = “cmdstanr”, threads = threading(2))

paul.buerkner · October 28, 2020, 12:17pm

Can you provide a minimal reproducible example on fake data?

paul.buerkner · October 28, 2020, 6:26pm

Ok, after some correspondence via email, here is a reproducible example of the problem:

library(brms)

set.seed(1234)
df <- data.frame(
  Independent = rnorm(262),
  Dependent = rbinom(262, 1, 0.01)
)

my_prior <- c(prior(normal(0,100),class='b',nlpar='Int')+
                prior(beta(6,112),class='b',nlpar='Indep', lb=0,ub=1))

model <- brm(bf(Dependent ~ Int
                         +Indep
                         ,nl=TRUE)+
                        lf(Int ~ 1)+
                        lf(Indep ~ 0+Independent),
                      data = df,
                      family = bernoulli("logit"),
                      prior = my_prior, chains = 1,
             silent = FALSE,
              backend='cmdstanr',
             threads=threading(2))

Compiling Stan program...
Start sampling
Running MCMC with 1 chain, with 2 thread(s) per chain...

Chain 1 Assertion failed!
Chain 1 
Chain 1 Program: C:\Users\paulb\AppData\Local\Temp\RtmpmQNTZp\filecbc529c272b_threads.exe
Chain 1 File: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h, Line 408
Chain 1 
Chain 1 Expression: index >= 0 && index < size()
Warnung: Chain 1 finished unexpectedly!

Fehler in rstan::read_stan_csv(out$output_files()) : 
  csvfiles does not contain any CSV file name
Zusätzlich: Warnmeldung:
 Fehler in rstan::read_stan_csv(out$output_files()) : 
  csvfiles does not contain any CSV file name

The error doesn’t happen without threading. I don’t understand where this error is coming from, so I would appreciate some input from other developers. Perhaps @wds15 has some ideas?

paul.buerkner · October 28, 2020, 6:36pm

Ok, after some correspondence via email, here is a reproducible example of the problem:

library(brms)

set.seed(1234)
df <- data.frame(
  Independent = rnorm(262),
  Dependent = rbinom(262, 1, 0.01)
)

my_prior <- c(prior(normal(0,100),class='b',nlpar='Int')+
                prior(beta(6,112),class='b',nlpar='Indep', lb=0,ub=1))

model <- brm(bf(Dependent ~ Int
                +Indep
                ,nl=TRUE)+
               lf(Int ~ 1)+
               lf(Indep ~ 0+Independent),
             data = df,
             family = bernoulli("logit"),
             prior = my_prior, chains = 1,
             silent = FALSE,
             backend='cmdstanr',
             threads=threading(2))

Compiling Stan program...
Start sampling
Running MCMC with 1 chain, with 2 thread(s) per chain...

Chain 1 Assertion failed!
  Chain 1 
Chain 1 Program: C:\Users\paulb\AppData\Local\Temp\RtmpmQNTZp\filecbc529c272b_threads.exe
Chain 1 File: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h, Line 408
Chain 1 
Chain 1 Expression: index >= 0 && index < size()
Warnung: Chain 1 finished unexpectedly!
  
  Fehler in rstan::read_stan_csv(out$output_files()) : 
  csvfiles does not contain any CSV file name
Zusätzlich: Warnmeldung:
  Fehler in rstan::read_stan_csv(out$output_files()) : 
  csvfiles does not contain any CSV file name

The error doesn’t happen without threading. I don’t understand where this error is coming from, so I would appreciate some input from other developers. Perhaps @wds15 has some ideas?

wds15 · October 28, 2020, 7:41pm

Indexing is messed up:

  real partial_log_lik(int[] seq, int start, int end, int[] Y, matrix X_Int, vector b_Int, matrix X_Indep, vector b_Indep) {
    real ptarget = 0;
    int N = end - start + 1;
    // initialize linear predictor term
    vector[N] nlp_Int = X_Int[start:end] * b_Int;
    // initialize linear predictor term
    vector[N] nlp_Indep = X_Indep[start:end] * b_Indep;
    // initialize non-linear predictor term
    vector[N] mu;
    for (n in 1:N) {
      int nn = n + start - 1;
      // compute non-linear predictor values
      mu[n] = nlp_Int[nn] + nlp_Indep[nn];
    }
    ptarget += bernoulli_logit_lpmf(Y[start:end] | mu);
    return ptarget;
  }

The nlp_Int and nlp_Indep should NOT be indexed with nn, but with n.

paul.buerkner · October 28, 2020, 7:46pm

You are right! Thank you

paul.buerkner · October 28, 2020, 7:55pm

The problem is now fixed in the github version of brms.

baldwaprateek · October 28, 2020, 8:13pm

Thank you Paul for quick action. Now I do not see any issue so far.

torkar · October 30, 2020, 12:55pm

@wds15 I’ll soon fit a large model (Bernoulli) so I have two questions:

Worthwhile to use within-chain parallelization on Bernoulli?
I have 16 cpus, should I then use cores = 8, chains = 4, threads = threading(2)?

wds15 · October 30, 2020, 1:05pm

Bernoulli is very hard to speed up, but you can try. If it is a hierarchical model, then you should tune the model in case you use brms (look at an issue I made in the brms repo).

I think you want cores=4, chains=4, threads=threading(2) … cores is the number of concurrent chains running I think (maybe that needs an update in notation).

torkar · October 30, 2020, 2:00pm

It would be lovely if you could add a paragraph in the current brms case study where you elaborate on the different likelihoods and the potential for speedup (the most common likelihoods of course) :) Maybe also spend some time on clarifying the cores/chains/threads again, but it could be me being a bit slow :)

wds15 · October 30, 2020, 4:33pm

Under quick summary we have

Models with computationally expensive likelihoods are easier to parallelize than less expensive likelihoods. For example, the Poisson distribution involves expensive
log \Gamma functions whereas the normal likelihood is very cheap to calculate in comparison.

Not sure how to write it more clearly?

The cores/chains/thread is also very verbosely spelled out under quick summary

The example above assumes that 4 cores are available which are best used without within-chain parallelization by running 4 chains in parallel. When using within chain parallelization it is still advisable to use just as many threads in total as you have CPU cores. It’s thus sensible in this case to reduce the number of chains running in parallel to just 2, but allow each chain to use 2 threads. Obviously this will reduce the number of iterations in the posterior here as we assumed a fixed amount of 4 cores.

If that should be reworded, then maybe someone other than me should do that as I have probably looked too much at the text.

torkar · October 30, 2020, 7:59pm

I think it’s an excellent tutorial, don’t get me wrong :)

Are you saying that log links are a good case but logit (which you kind of indicated you were a bit uncertain about?) was not a good case? Or are logit/log good for this case, or are there distributional assumptions that should guide this?
So with 16 cores I assume 16 threads. Given four chains, I should use cores=16, chains=4, threads=threading(4)? All chains are chopped up into four pieces each.

My initial tests using a logit link were promising. I’m just trying to understand, and as I said, the tutorial is excellent.

paul.buerkner · November 1, 2020, 11:26am

Just one quick comment: cores only applies to between-chain parallelization via different chains. Thus, you never need cores > chains

Topic		Replies	Views
Brms: within-chain parallelization fails with negative binomial likelihood brms	3	674	November 21, 2020
Trouble with within-chain parallelization with dev version of brms brms	2	700	October 21, 2020
Within-chain parallelization error: All variables in all chains must have the same length brms	11	1466	December 17, 2021
Chain finished unexpectedly when using brms on a cluster brms fitting-issues	5	1235	June 26, 2025
Threading with backend = "rstan" brms rstan	11	2473	March 26, 2021

Within-chain parallelization misbehaves with brms in a model with measurement errors

Related topics