Pystan on multiprocessing

Hi,
Here I am generating mixture Gaussian model data and using pystan as a model. I want to run pystan sampling part in multiprocessing, and I put the ‘n_jobs=20’ since my computer has 10 cores. But on the ‘task manager’, the CPU is still something like 5.9%, which means the code did not run in multiprocessing. So anyone can help me to write the code in multiprocessing? Thanks. Below is my code.

from future import division
import pystan
import pymc3 as pm
from scipy import stats
import matplotlib.pyplot as plt
import theano.tensor as tt
import seaborn as sns
import numpy as np
import os
import sys

os.environ[‘STAN_NUM_THREADS’] = “20”

generate the data

component=np.random.choice([0,1],size=50)
x=[]
loc=[[1,5],[5,2]]
s1=1
s2=1
ro=0.5
scale=[[s1*s1,ro*s1*s2],[ros1s2,s2*s2]]
scale=[scale,scale]
for i in component:
x.append(stats.multivariate_normal.rvs(mean=loc[i],cov=scale[i]))
x=np.array(x)

model=’’’
data {
int D; //number of dimensions
int K; //number of gaussians
int N; //number of data
vector[D] y[N]; //data
//real con[N]; //co ncentration

}

parameters {
simplex[K] theta; //mixing proportions
ordered[D] mu[K]; //mixture component means
cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}

transformed parameters{
cholesky_factor_corr[D] cov[K];
real sx[K];
real sy[K];
real ro[K];
// vector[N] lamba;
cov=L;
for (i in 1:K){
sx[i]=sqrt(cov[i,1,1]);
sy[i]=sqrt(cov[i,2,2]);
ro[i]=cov[i,1,2]/sqrt(cov[i,1,1])/sqrt(cov[i,2,2]);
}
// for (i in 1:N){
// lamba[i]=1/(theta[1](1./2./sx[1]/sy[1]/sqrt(1-ro[1]ro[1])exp(-1./2./(1-ro[1]ro[1])((y[i,1]-mu[1,1])(y[i,1]-mu[1,1])/sx[1]/sx[1])+(y[i,2]-mu[1,2])(y[i,2]-mu[1,2])/sy[1]/sy[1]-2.ro[1](y[i,1]-mu[1,1])(y[i,2]-mu[1,2])/sx[1]/sy[1]))+theta[2](1./2./sx[2]/sy[2]/sqrt(1-ro[2]ro[2])exp(-1./2./(1-ro[2]ro[2])((y[i,1]-mu[2,1])(y[i,1]-mu[2,1])/sx[2]/sx[2])+(y[i,2]-mu[2,2])(y[i,2]-mu[2,2])/sy[2]/sy[2]-2.ro[2](y[i,1]-mu[2,1])(y[i,2]-mu[2,2])/sx[2]/sy[2])));
//}
}

model {
real ps[K];

for(k in 1:K){
mu[k] ~ uniform(-10,10);
L[k] ~ lkj_corr_cholesky(2);
//con ~ exponential(lamba);

}

for (n in 1:N){
for (k in 1:K){
ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
}
target += log_sum_exp(ps);
}
// for(i in 1:N){
// target += - lamba[i]*con[i];
//}

}

‘’’
if name == ‘main’:
dat={‘N’:50,‘y’:x,‘D’:2,‘K’:2}

fit=pystan.stan(model_code=model, data=dat, iter=5000, warmup=500, chains=1,n_jobs=20)

print(fit)

For threading, you need to use map_rect function. See https://github.com/stan-dev/math/wiki/Threading-Support

And for a good tutorial https://github.com/rmcelreath/cmdstan_map_rect_tutorial

On PyStan n_jobs and num_stan_threads are separate things.

n_jobs <= chains

See https://pystan.readthedocs.io/en/latest/threading_support.html

ps. To format your multiline codeblock use 3 ``` at start and end.

Hi,
I have added a map_rect function (the bold code), but it gives me a error. Could you help me fix that? Thanks.
My new code is below:


from __future__  import division
import pystan
import pymc3 as pm
from scipy import stats
import matplotlib.pyplot as plt
import theano.tensor as tt
import seaborn as sns
import numpy as np
component=np.random.choice([0,1],size=500)
x=[]
loc=[[1,5],[5,2]]
s1=1
s2=1
ro=0.5
scale=[[s1*s1,ro*s1*s2],[ro*s1*s2,s2*s2]]
scale=[scale,scale]

* List item

for i in component:
  x.append(stats.multivariate_normal.rvs(mean=loc[i],cov=scale[i]))
x=np.array(x)


c=[]
for i in range(len(x)):
 c.append(stats.expon.rvs(scale=1/(0.5*stats.multivariate_normal.pdf(x[i],mean=loc[0],cov=scale[0])+0.5*stats.multivariate_normal.pdf(x[i],mean=loc[1],cov=scale[1]))))

x=np.array(x)
**model='''**
**functions {**
**  vector bl_glm(real[] theta, vector[] mu, cholesky_factor_corr[] L, vector[] y) {**
**  real K=theta.size();**
**  real N=y.size();**
**  real ps[K];**
** for (k in 1:K){**
** ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y | mu[k], L[k]); //increment log probability of the gaussian**
** }**
**return [ps]';**
**}**


data {
 int D; //number of dimensions
 int K; //number of gaussians
 int N; //number of data
 vector[D] y[N]; //data

}

parameters {
 real[K] theta; //mixing proportions
 vector[D] mu[K]; //mixture component means
 cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}

model {
 real ps[K];

 for(k in 1:K){
 mu[k] ~ normal(0,4);
 L[k] ~ lkj_corr_cholesky(4);
 }

 
  target += sum(map_rect(bl_glm, mu, L,  y,  con);
}

'''


dat={'N':500,'y':x,'D':2,'K':2}#,'con':c}
extra_compile_args = ['-pthread', '-DSTAN_THREADS']

stan_model = pystan.StanModel(
   model_code=model,
   extra_compile_args=extra_compile_args)

fit = stan_model.sampling(data=dat, n_jobs=4)

print(fit)

The error is: 
Forget to mention the error is below:
ValueError: Failed to parse Stan model 'anon_model_fc39d0e82b2a948e69d5573a0f8031ea'. Error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

  error in 'unknown file name' at line 3, column 42
  -------------------------------------------------
     1: 
     2: functions {
     3:   vector bl_glm(real[] theta, vector[] mu, cholesky_factor_corr[] L, vector[] y) {
                                                 ^
     4:   real K=theta.size();
  -------------------------------------------------

PARSER EXPECTED: <argument declaration or close paren ) to end argument declarations>

Hello,
Could you help with map_rect function? I have wrote below and it gives errors. Thanks.

Hi,

This is missing parenthesis.

target += sum(map_rect(bl_glm, mu, L, y, con);

But do you really need bl_glm function?

You need to create a function that calculates an item in ps.

For description of the issue see tutorial https://github.com/rmcelreath/cmdstan_map_rect_tutorial

and our Stan 2.18 User Guide (chapter 22)

Hi,
why you think bl_glm function is not necessary? Is it used for multiprocess? I have add the parenthesis, but it still give the same error. Could you help check whether the way below is right?


functions {
vector bl_glm(real[] theta, vector[] mu, cholesky_factor_corr[] L, vector[] y) {real K=theta.size(); 
real N=y.size();

Please read the User guide for map reduce. Also, there are more discussion on the Discourse on the parallel computing.

In your original example there was no bl_glm but it was in the example showing map_rect usage.

Also, just want to make sure we are on the same page, I am running the code on windows system and using python 2.7.13. I read some website says pystan could not work on multiprocessing on windows. Do you think could that be a issue? Thanks.

Yes, multithreading is not supported on Windows.

Multiprocessing (n_jobs) works with MingW-w64 by distributing chains on parallel cpu. (Python 3). Python 2 does not support multiprocessing on Windows.

Remember to call sampling under __main__ block

if __name__ == '__main__':
    code here

see https://pystan.readthedocs.io/en/latest/windows.html

So multithreading is for running iterations in different CPU’s right? Do you have any suggestions for running multithreading on Windows? Or is it not possible at all? As an alternative, is multithreading in stan in R possible on Windows?

Multiprocessing runs chains on individual cpu.

Multithreading parallelizes specific calculations to multiple cpu.

If you really want to do it on Windows there is a experimental (not official) tutorial that I made. It uses MSVC 2017 + clang to circumvent the thread_local bug on MinGW-w64/winpthreads.

See Multithreading on Windows with PyStan 2.18+

1 Like

Hi. I am wondering if it is possible to hand write a C++ code for multithreading in the pystan model=’’’ part of the code instead of trying the experimental way to be able to use multithreading in the module? When my code includes a lot of data, it takes ~30 minutes to run on Windows with only 10,000 iterations so I would like to try the best way to decrease the time. Thanks!

What do you mean by multithreading? Not map_rect?

Theading things are done in Stan side, so you would need to touch Stan source. For this I recommend that you clone our repo and install in place.

git clone --recursive https://github.com/stan-dev/pystan
cd pystan
pip install -e .

For threading discussion, please read other discussion going on this forum.

Ps. PyStan 3 uses threading to run chains parallel, but that is not probably the thing you are looking for.

Edit. You get cpp code with stanc and you can edit this after saving. Then you can read the modified cpp code and replace in stanc output that can then go to StanModel init.

Yes I am concerned about running iterations on different CPU, not different chains on different CPUs. So there is no way to write the multithreading for running different iterations in the model code? I would have to edit the source code?

I mean that currently map_rect is the way to go if your model can be parallelized.

Otherwise you need to handle C++ and Stan types.

There might be other ways to speed up your model (setting good priors etc).