Extracting a variable from a stanfit object fails when `save_warmup=True`

I’m asking this here instead of using the issue tracker since I’m not entirely sure whether this is a bug or a case of PEBCAK.

I am fitting a model using cmdstanpy, using

sample_args = {'iter': 100, 'warmup': 10, 'chains': 1, 'seed': 42, 'adapt_delta': 0.8, 'max_treedepth': 16, 'save_warmup': True}
fit_model = sample_cmdstanpy(my_model, input_data, sample_args)

(please disregard the low numbers of draws and chains, this is just to illustrate the problem, which occurs also for more realistic values of these parameter.)

After the fitting/sampling is completed, I attempt to extract a variable (created in the generated quantities block of the model) from the resulting stanfit object (to be precise, the object type is: cmdstanpy.stanfit.CmdStanMCMC). This is done using

my_variable = fit_model.stan_variable('quantity_of_interest')

This causes the following exception

File ".../venv/lib/python3.8/site-packages/cmdstanpy/stanfit.py", line 762, in stan_variable
    self._draws[
ValueError: cannot reshape array of size 561400 into shape (110,5614)

I expect an array of size 5614 for each draw, and since I am asking for 100 post-warmup draws + 10 warmup draws, and exactly 10x5610 entries are missing somewhere along the way, this suggests to me that the warmup draws are somehow lost.

I should also say that the code in question worked (i.e., supplied all post-warmup draws) when save_warmup was set to False.

Digging deeper, I put a break-point in stan_variable, the code of which is:

    def stan_variable(self, name: str) -> pd.DataFrame:
        """
        Return a new DataFrame which contains the set of post-warmup draws
        for the named Stan program variable.  Flattens the chains.
        Underlyingly draws are in chain order, i.e., for a sample
        consisting of N chains of M draws each, the first M array
        elements are from chain 1, the next M are from chain 2,
        and the last M elements are from chain N.

        * If the variable is a scalar variable, the shape of the DataFrame is
          ( draws X chains, 1).
        * If the variable is a vector, the shape of the DataFrame is
          ( draws X chains, len(vector))
        * If the variable is a matrix, the shape of the DataFrame is
          ( draws X chains, size(dim 1) X size(dim 2) )
        * If the variable is an array with N dimensions, the shape of the
          DataFrame is ( draws X chains, size(dim 1) X ... X size(dim N))

        :param name: variable name
        """
        if name not in self._stan_variable_dims:
            raise ValueError('unknown name: {}'.format(name))
        self._assemble_draws()
        dim0 = self.num_draws * self.runset.chains
        dims = np.prod(self._stan_variable_dims[name])
        pattern = r'^{}(\[[\d,]+\])?$'.format(name)
        names, idxs = [], []
        for i, column_name in enumerate(self.column_names):
            if re.search(pattern, column_name):
                names.append(column_name)
                idxs.append(i)
        return pd.DataFrame(
            self._draws[
                self._draws_warmup:, :, idxs
            ].reshape((dim0, dims), order='A'),
            columns=names
        )

Checking the values of the variables here, dim0=110, and dims=5614. So it looks like in the final return statement (where the error is occurring), the function is taking the entries of self._draws starting at self._draws_warmup=10, (so, the 100 post-warmup draws), and trying to reshape those into size (num_draws_including_warmup, num_values_generated_per_draw).

Even stan_variable's docstring indicates that it returns the post-warmup draws… so it seems to be performing as advertised, but not playing nicely with the save_warmup option.

Is it possible that the line

dim0 = self.num_draws * self.runset.chains

should be

dim0 = self._draws_sampling * self.runset.chains

instead?

Or, am I missing something obvious and am just doing something wrong?

Thanks a bunch!
Chai

P.S.
I could, if needed, attempt to generate minimal code to recreate the issue, but unless I’m misreading the python code that generates the exception, I think the problem should be clear even without that.

  • Operating System: Ubuntu 20.4.1
  • CmdStan Version: 2.25
    *CmdStanPy Version: 0.9.67
  • Compiler/Toolkit: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
1 Like

Yes, there is some error. I need to find sometime to hunt this bug.

Here is the GitHub issue

I feel like I have fixed this one already once, but probably never pushed the changes to GitHub.

1 Like

Thank you for confirming! (And sorry to have missed the open issue somehow… possibly because I actually encountered this first a few weeks ago, and when the issue was not yet open, and only now found the time to ask about it).

Does the solution I suggested sound reasonable? (at least in my case, self._draws_sampling=100, so I think this would at least avoid the error and return post-warmup draws as described in the docstring).
I’d be happy to fork and try and solve it myself if you think that’s helpful.

Or do you think stan_variable should return the warmup draws too when save_warmup==True?