When calling [CmdStanMCMC].draws_pd(“foo”), I would get a data frame with columns: chain__, tier__, and draw__ in addition to the “foo”
When calling [CmdStanMCMC].draws_pd(). I wold get a data frame with columns: chain__, tier__, and draw__, lp__, accept_stat__, stepsize__, treedepth__, n_leapfrog__, divergent__, energy__
Is this the new behaviour of draws_pd? If I don’t want any of the diagnostic data columns returned, what should I call draws_pd with?
We added those columns to allow re-constructing of the chains if required (e.g., if you wanted to groupby them). Whether or not vars should work to request them not be included is something I didn’t consider during implementation.
I’d like the opinions of @mitzimorris and @roualdes, who requested the feature originally.
Recently, for inclusion in CmdStanPy 1.2, I requested draws_pd() include the columns chain__, iter__, and draw__, see cmdstanpy issue #676. It’s my understanding that CmdStanPy versions prior to 1.2 also included the columns lp__, …, and energy__.
My thinking behind the inclusion of all columns, including diagnostic and chain related information, is that more information by default is better. This is especially true, in my opinion, since from a user’s perspective it’s harder to get the extra information contained in these columns than it is for a user to remove those columns, e.g.
A more specific reason I filed issue #676 is that CmdStanR by default provides columns chain__, iter__, and draw__ into a draws dataframe, e.g. [CmdStanMCMC]$draws(format = "df"). So I saw this as a step to better align CmdStan* interfaces.
If desired, @akcchoi, please open an issue on the CmdStanPy GitHub repository to discuss adding a flag, or some such option, to exclude the diagnostic and/or chain information.