Transitioning from pystan 2.x to 3.x

Hey all, I am fairly new to stan (started using it this year only). I have for the majority of the year used pystan 2.19, but have transitioned to pystan 3.x. I just need some clarity on the differences in function calls, and specifying additional parameters such as adapt_delta and max_treedepth. I typically got false divergences for my models, so I always made my adapt_delta = 0.99. I also saved information like my stepsize and inv_metric to do additional runs if required.

I have familiarized myself with the basics of 3.x, like instead of pystan.StanModel() we use stan.build() and model.sampling() we use model.sample()

In pystan 2.19 adapat_delta and max_treedepth is accessed through the control dict variable. I just want to know:

  1. How I can change these variables in 3.x?
  2. How do I access the inv_metric and stepsize after adaption warmup finishes and specify these for new additional runs?
  3. How do I access the log posterior ‘lp__’ for each sample?
  4. How do I get convergence metrics, rhat and neff?

I’ve read the documentation for pystan3.7 but could not find any information regarding these

Hi, yes there are couple of different changes in PyStan 3 vs PyStan 2. Mainly that PyStan 3 has support for main functionality and it has dropped support for all the experimental stuff.

If you need to use these properties I recommend using CmdStanPy which has larger support for experimental properties.

  1. Pystan uses similar keywords as does CmdStan, see https://github.com/stan-dev/httpstan/blob/main/httpstan/services/cmdstan-help-all.json

  2. I recommend CmdStanPy for inverse metric. fit["stepsize__"]. Also arviz inferencedata has support for these.

  3. fit["lp__"]

  4. I recommend using ArviZ arviz.from_pystan — ArviZ 0.16.1 documentation

I recommend also adding named dimensions and coordinates where suitable and then saving the results as netcdf4.

import arviz as az
inferencedata = az.from_pystan(posterior=fit, posterior_model=model)
summary = az.summary(inferencedata)
print(summary)
1 Like

Thanks @ahartikainen. Looking at the cmdstan-help-all.json file cleared up a lot of confusion I had. Just to clarify a few aspects.

  1. I plotted the stepsize and from this I concluded that the stepsizes stores num_chains*num_samples elements but only the first num_chains is the stepsizes, the rest are repeats? And also by this logic this means that the posterior samples in the fit dict are from chains: 1, 2, 3, 4, 1, 2, 3, 4, … and not from chains: 1, 1, …, 2, 2, …, 3, 3, …, 4, 4, … as with pystan 2.19 fit.extract() method?

  2. I see that the cmdstan-help-all.json has an engaged parameters. If I specify num_warmup=0 do I still need to specify engaged=False (for additional runs without adaption)? When doing this I get an error stating “ValueError: {‘json’: {‘engaged’: [‘Unknown field.’]}}”

  3. Trying to set metric=np.array([1, 1, 1]) (for a simple linear regression problem with slope intercept and variance as model parameters, just to check if I understand the interface correctly), I again get a value “ValueError: {‘json’: {‘metric’: [‘Unknown field.’]}}”

  4. Q2-3 leads into my final question. When you say you recommend using cmdstanpy is this only for extracting the metric or is it for drawing additional samples from an already tuned fit? I.e. using the metric and stepsize, with some init (probably sample means) from the fit as fixed parameters?

there is a nice summary of the differences here: CmdStanPy 1.0 - #3 by WardBrian

1 Like