I am running this tutorial using pystan
http://data.princeton.edu/pop510/hospStan.html
one of the input data is row_vector[K] x[N]; // predictors
I read out the txt file using numpy.genfromtxt, and converted the 4 designated columns to an array of tuples. Just to try it out, knowing its prob not correct. There is no obvious way to convert to array of row_vectors
The pystan.stan() call fails with
accessing element out of range. index 0 out of range;
I can’t find any where in the manual where it explains how to feed into a row_vector[K] from a python client. Can someone please help explain?
So after you have read the data from hospital.txt
with np.genfromtxt
you have NxD
numpy array (floats).
import numpy as np
hosp = np.genfromtxt("./hospital.txt", skip_header=1, skip_footer=1)
Then you can slice the data to dictionary almost the same way as the example will show you.
# R (list)
hosp_data <- list(N=nrow(hosp),M=501,K=4,y=hosp[,1],x=hosp[,2:5],g=hosp[,6])
# Python (dictionary)
hosp_data = dict(N=len(hosp), M=501, K=4, y=hosp[:, 0].astype(int), x=hosp[:, 1:5], g=hosp[:, 5].astype(int))
The x
is a NxK
numpy array.
Thank you sir. Works like a charm.
could you also help me with
print(hfit, pars=c(“alpha”,“beta[1]”,“beta[2]”,“beta[3]”,“beta[4]”,“sigma”),
and
traceplot(hfit,c(“alpha”,“beta[1]”,“beta[2]”,“beta[3]”,“beta[4]”,“sigma”),
ncol=1,nrow=6,inc_warmup=F)
I don’t find the pystan equivalent. I tried hfit.summary(), hfit.traceplot(). Nothing gets dumped out.
This is apparently one of the challenges of working with pystan instead of rstan. Its hard to find references.
You can do print(hfit)
. The problem with this is that you can not define params.
Also hfit.plot(pars=('alpha', 'beta'))
gives you the traceplots, but again, the problem is that you can not define cols / rows. So ‘beta’ variables are plotted in the same figure.
These are known issues. (see https://github.com/stan-dev/pystan/issues/357 & https://github.com/stan-dev/pystan/issues/201 for print(fit) and for the plot problem idea is that we update our plotting code and move to use mcmcplotlib.
Hi, I made a PR#359 (see link) to enable the vars in print (or close to that).
print(pystan.misc._print_stanfit(fit, pars=['alpha', 'beta[0]', 'sigma'], probs=(0.025, 0.50, 0.975), digits_summary=3))
For current situation you can create a pandas dataframe from summary.
import pandas as pd
summary_dict = hfit.summary()
summary_df = pd.DataFrame(data=summary_dict['summary'],
index=summary_dict['summary_rownames'],
columns=summary_dict['summary_colnames'])
or for each chain
chain_summary_list = []
for chain in range(summary_dict['c_summary'].shape[-1]):
table_chain = pd.DataFrame(data=summary_dict['c_summary'][:, :, chain],
index=["{}_{}".format(name, chain) for name in summary_dict['c_summary_rownames']],
columns=summary_dict['c_summary_colnames'])
chain_summary_list.append(table_chain)
#call specific chain summary
chain[0]
# or create a dataframe from the subdataframes
summary_ = pd.concat(chain_summary_list, axis=0)
I hope these examples will work for your.
I am going to back off pyStan in favor of RStan until pyStan catches up with the plot and print functions a bit.