Subclassing Cmdstanpy

Funko_Unko · March 16, 2021, 7:48pm

I don’t want to derail the other thread any further (Cookiecutter-cmdstanpy: a template for cmdstanpy projects sorry @Teddy_Groves1).

@ahartikainen yes, a lot of the functions are mere convenience functions. For the timings eg I do just parse all stdout files and extract warmup and sampling timings if available.

Once I had subclasses for my main functionality the threshold was quite low to add extra functionality as member functions.

Funko_Unko · March 16, 2021, 7:51pm

Is there a canonical way to do this?

Edit: meaning, I have some run of a model, and I want to take off exactly where I left and continue sampling or warming up.

mitzimorris · March 17, 2021, 3:49am

Restart sampling from a fit object.

replied in other thread here; Cookiecutter-cmdstanpy: a template for cmdstanpy projects - #12 by mitzimorris

Funko_Unko · March 17, 2021, 10:05am

@ahartikainen @mitzimorris

Edit: fixed relevant quote.

I’ve opened an issue here `CmdStanMCMC`'s function `stan_variable` does not handle arrays of vectors correctly. · Issue #361 · stan-dev/cmdstanpy · GitHub
I did run pip install --upgrade git+https://github.com/stan-dev/cmdstanpy before, so this should be up to date, shouldn’t it?

Edit: It gets weirder. This issue only arises if we take exactly one sample. I’ve updated the issue.

Funko_Unko · March 17, 2021, 11:13am

I mean restarting with the current metric, step size and points in parameter space. There were three issues:

Couldn’t initialize stepsizes with an np.ndarray (this appears to be fixed)
Couldn’t initialize inits with a list of dicts, added an issue here Can't initialize chains with different inits via list of dicts. · Issue #362 · stan-dev/cmdstanpy · GitHub
Couldn’t initialize metrics with a list of np.ndarrays or with a single one. Didn’t add an issue.

The last two points aren’t really advertised functionalities of the sample method, but this sure would be convenient (at least for me).

Anyhow, what I did in subclassing the CmdStanModel was just hacking together the appropriate json serialization.

Hm, I do not believe it is possible to infer the size of an array in Stan, or is it? Meaning for example having a data block

int no_elements;
vector[no_elements] v;

and only specifying v? Because I’m lazy, I didn’t want to always type (in python) data=dict(v=v, no_elements=len(v)) but just data=dict(v=v) and let no_elements be inferred from the shape of v. It’s of course not only about the shape of arrays, but still probably not a mainstream functionality.

Yes, the use case is not to just discard the aberrant chains and act as if everything was fine. Instead I have the following situation:

I have a reference solution/sample/method, and want to compare it to other methods. For some of the methods, there arise issues during sampling, leading to chains gettings stuck and taking super long. Eventually, they may find the other chains, or they may not. In either case, I wan’t to be able to terminate the chain, and work with the data that I have, to compare the different methods. Currently I only use the timings, but I could also work with either all of the data of the finished chains or with all the complete and partial data.

Things would of course be easier if I would just submit a huge job on a cluster, but, apart from not having access right now, doing things locally and in this way shortens the development iteration times.

Just the stan functions/model blocks. See e.g. this Adjoint ODE Prototype - RFC - Please Test - #7 by Funko_Unko or this Adjoint ODE Prototype - RFC - Please Test - #16 by Funko_Unko monster of a figure for an example.

mitzimorris · March 17, 2021, 6:36pm

this won’t fly because the fact that no_elements is an array dimension is not something that can easily be inferred. from the point of view of the generated c++ program, it’s just another int variable.

also, this:

data {
  int no_elements;
  vector[no_elements] foo;
  vector[no_elements] bar;
 ...
}

when foo and bar have different lengths, where is the problem?

Funko_Unko · March 18, 2021, 8:25am

Yes, I was just expressing my confusion at @ahartikainen’s question, because I did not believe it possible (in practice or in principle), because stan does not and cannot know that this int is actually just the size of that vector. I guess such a functionality could be added, but I don’t believe anyone would want to do that. Looks like there would be very little benefit for a lot of headache.

One of the reasons I wanted this functionality is because I have different models, which all can be characterized by some number of main states, which will then induce a model-specific number of auxiliary states and parameters. This can of course be partially done within Stan, but I did not do this for two reasons.

First, I wanted to keep the stan interface as similar as possible across models, and only change the functions (containing the ODE) and model blocks.
But also, I believe there were some issues with using inferred data (quantities from the transformed data block) in specifying the size of data arrays/vectors. I don’t know whether there is a way to do this in Stan, and as I had a solution ready, I didn’t bother investigating this further.

There would be no problem, just an exception thrown ;)

mitzimorris · March 18, 2021, 3:44pm

correct, this isn’t possible - variables in the data block must be defined before being referenced, therefore all array dimensions used in data variable declarations must be supplied as data.

Topic		Replies	Views
Cookiecutter-cmdstanpy: a template for cmdstanpy projects Project Proposals cmdstanpy , feedback-wanted	24	2310	April 6, 2023
New way of packaging Stan models in Python Publicity cmdstanpy	0	608	August 2, 2022
CmdStanPy - ready for beta testing! Developers pystan	23	2166	August 6, 2019
Separate compilation of model and services code complete Developers	15	863	July 10, 2019
CmdStanCache: caches Stan MCMC for quicker model iterations General	2	320	January 30, 2023

Subclassing Cmdstanpy

Related topics