# Pystan on multiprocessing

Hi,
Here I am generating mixture Gaussian model data and using pystan as a model. I want to run pystan sampling part in multiprocessing, and I put the ‘n_jobs=20’ since my computer has 10 cores. But on the ‘task manager’, the CPU is still something like 5.9%, which means the code did not run in multiprocessing. So anyone can help me to write the code in multiprocessing? Thanks. Below is my code.

from future import division
import pystan
import pymc3 as pm
from scipy import stats
import matplotlib.pyplot as plt
import theano.tensor as tt
import seaborn as sns
import numpy as np
import os
import sys

# generate the data

component=np.random.choice([0,1],size=50)
x=[]
loc=[[1,5],[5,2]]
s1=1
s2=1
ro=0.5
scale=[[s1*s1,ro*s1*s2],[ros1s2,s2*s2]]
scale=[scale,scale]
for i in component:
x.append(stats.multivariate_normal.rvs(mean=loc[i],cov=scale[i]))
x=np.array(x)

model=’’’
data {
int D; //number of dimensions
int K; //number of gaussians
int N; //number of data
vector[D] y[N]; //data
//real con[N]; //co ncentration

}

parameters {
simplex[K] theta; //mixing proportions
ordered[D] mu[K]; //mixture component means
cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}

transformed parameters{
cholesky_factor_corr[D] cov[K];
real sx[K];
real sy[K];
real ro[K];
// vector[N] lamba;
cov=L;
for (i in 1:K){
sx[i]=sqrt(cov[i,1,1]);
sy[i]=sqrt(cov[i,2,2]);
ro[i]=cov[i,1,2]/sqrt(cov[i,1,1])/sqrt(cov[i,2,2]);
}
// for (i in 1:N){
// lamba[i]=1/(theta[1](1./2./sx[1]/sy[1]/sqrt(1-ro[1]ro[1])exp(-1./2./(1-ro[1]ro[1])((y[i,1]-mu[1,1])(y[i,1]-mu[1,1])/sx[1]/sx[1])+(y[i,2]-mu[1,2])(y[i,2]-mu[1,2])/sy[1]/sy[1]-2.ro[1](y[i,1]-mu[1,1])(y[i,2]-mu[1,2])/sx[1]/sy[1]))+theta[2](1./2./sx[2]/sy[2]/sqrt(1-ro[2]ro[2])exp(-1./2./(1-ro[2]ro[2])((y[i,1]-mu[2,1])(y[i,1]-mu[2,1])/sx[2]/sx[2])+(y[i,2]-mu[2,2])(y[i,2]-mu[2,2])/sy[2]/sy[2]-2.ro[2](y[i,1]-mu[2,1])(y[i,2]-mu[2,2])/sx[2]/sy[2])));
//}
}

model {
real ps[K];

for(k in 1:K){
mu[k] ~ uniform(-10,10);
L[k] ~ lkj_corr_cholesky(2);
//con ~ exponential(lamba);

}

for (n in 1:N){
for (k in 1:K){
ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y[n] | mu[k], L[k]); //increment log probability of the gaussian
}
target += log_sum_exp(ps);
}
// for(i in 1:N){
// target += - lamba[i]*con[i];
//}

}

‘’’
if name == ‘main’:
dat={‘N’:50,‘y’:x,‘D’:2,‘K’:2}

fit=pystan.stan(model_code=model, data=dat, iter=5000, warmup=500, chains=1,n_jobs=20)

print(fit)

And for a good tutorial https://github.com/rmcelreath/cmdstan_map_rect_tutorial

On PyStan n_jobs and num_stan_threads are separate things.

``````n_jobs <= chains
``````

ps. To format your multiline codeblock use 3 ``` at start and end.

Hi,
I have added a map_rect function (the bold code), but it gives me a error. Could you help me fix that? Thanks.
My new code is below:

``````from __future__  import division
import pystan
import pymc3 as pm
from scipy import stats
import matplotlib.pyplot as plt
import theano.tensor as tt
import seaborn as sns
import numpy as np
component=np.random.choice([0,1],size=500)
x=[]
loc=[[1,5],[5,2]]
s1=1
s2=1
ro=0.5
scale=[[s1*s1,ro*s1*s2],[ro*s1*s2,s2*s2]]
scale=[scale,scale]

* List item

for i in component:
x.append(stats.multivariate_normal.rvs(mean=loc[i],cov=scale[i]))
x=np.array(x)

c=[]
for i in range(len(x)):
c.append(stats.expon.rvs(scale=1/(0.5*stats.multivariate_normal.pdf(x[i],mean=loc[0],cov=scale[0])+0.5*stats.multivariate_normal.pdf(x[i],mean=loc[1],cov=scale[1]))))

x=np.array(x)
**model='''**
**functions {**
**  vector bl_glm(real[] theta, vector[] mu, cholesky_factor_corr[] L, vector[] y) {**
**  real K=theta.size();**
**  real N=y.size();**
**  real ps[K];**
** for (k in 1:K){**
** ps[k] = log(theta[k])+multi_normal_cholesky_lpdf(y | mu[k], L[k]); //increment log probability of the gaussian**
** }**
**return [ps]';**
**}**

data {
int D; //number of dimensions
int K; //number of gaussians
int N; //number of data
vector[D] y[N]; //data

}

parameters {
real[K] theta; //mixing proportions
vector[D] mu[K]; //mixture component means
cholesky_factor_corr[D] L[K]; //cholesky factor of covariance
}

model {
real ps[K];

for(k in 1:K){
mu[k] ~ normal(0,4);
L[k] ~ lkj_corr_cholesky(4);
}

target += sum(map_rect(bl_glm, mu, L,  y,  con);
}

'''

dat={'N':500,'y':x,'D':2,'K':2}#,'con':c}

stan_model = pystan.StanModel(
model_code=model,
extra_compile_args=extra_compile_args)

fit = stan_model.sampling(data=dat, n_jobs=4)

print(fit)

The error is:
Forget to mention the error is below:
ValueError: Failed to parse Stan model 'anon_model_fc39d0e82b2a948e69d5573a0f8031ea'. Error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

error in 'unknown file name' at line 3, column 42
-------------------------------------------------
1:
2: functions {
3:   vector bl_glm(real[] theta, vector[] mu, cholesky_factor_corr[] L, vector[] y) {
^
4:   real K=theta.size();
-------------------------------------------------

PARSER EXPECTED: <argument declaration or close paren ) to end argument declarations>``````

Hello,
Could you help with map_rect function? I have wrote below and it gives errors. Thanks.

Hi,

This is missing parenthesis.

`target += sum(map_rect(bl_glm, mu, L, y, con);`

But do you really need `bl_glm` function?

You need to create a function that calculates an item in `ps`.

For description of the issue see tutorial https://github.com/rmcelreath/cmdstan_map_rect_tutorial

and our Stan 2.18 User Guide (chapter 22)

Hi,
why you think bl_glm function is not necessary? Is it used for multiprocess? I have add the parenthesis, but it still give the same error. Could you help check whether the way below is right?

``````functions {
vector bl_glm(real[] theta, vector[] mu, cholesky_factor_corr[] L, vector[] y) {real K=theta.size();
real N=y.size();``````

Please read the User guide for map reduce. Also, there are more discussion on the Discourse on the parallel computing.

In your original example there was no `bl_glm` but it was in the example showing `map_rect` usage.

Also, just want to make sure we are on the same page, I am running the code on windows system and using python 2.7.13. I read some website says pystan could not work on multiprocessing on windows. Do you think could that be a issue? Thanks.

Yes, multithreading is not supported on Windows.

Multiprocessing (`n_jobs`) works with MingW-w64 by distributing chains on parallel cpu. (Python 3). Python 2 does not support multiprocessing on Windows.

Remember to call sampling under __main__ block

``````if __name__ == '__main__':
code here
``````

So multithreading is for running iterations in different CPU’s right? Do you have any suggestions for running multithreading on Windows? Or is it not possible at all? As an alternative, is multithreading in stan in R possible on Windows?

Multiprocessing runs chains on individual cpu.

Multithreading parallelizes specific calculations to multiple cpu.

If you really want to do it on Windows there is a experimental (not official) tutorial that I made. It uses MSVC 2017 + clang to circumvent the thread_local bug on MinGW-w64/winpthreads.

1 Like

Hi. I am wondering if it is possible to hand write a C++ code for multithreading in the pystan model=’’’ part of the code instead of trying the experimental way to be able to use multithreading in the module? When my code includes a lot of data, it takes ~30 minutes to run on Windows with only 10,000 iterations so I would like to try the best way to decrease the time. Thanks!

What do you mean by multithreading? Not `map_rect`?

Theading things are done in Stan side, so you would need to touch Stan source. For this I recommend that you clone our repo and install in place.

``````git clone --recursive https://github.com/stan-dev/pystan
cd pystan
pip install -e .
``````

Edit. You get cpp code with `stanc` and you can edit this after saving. Then you can read the modified cpp code and replace in stanc output that can then go to `StanModel` init.
I mean that currently `map_rect` is the way to go if your model can be parallelized.