When calling [CmdStanMCMC].draws_pd(“foo”), I would get a data frame with columns: chain__, tier__, and draw__ in addition to the “foo”
When calling [CmdStanMCMC].draws_pd(). I wold get a data frame with columns: chain__, tier__, and draw__, lp__, accept_stat__, stepsize__, treedepth__, n_leapfrog__, divergent__, energy__
Is this the new behaviour of draws_pd? If I don’t want any of the diagnostic data columns returned, what should I call draws_pd with?
python: 3.9.15 (main, Nov 24 2022, 14:31:59)
LOCALE: ('en_US', 'UTF-8')
cmdstan: (2, 33)
We added those columns to allow re-constructing of the chains if required (e.g., if you wanted to groupby them). Whether or not
vars should work to request them not be included is something I didn’t consider during implementation.
I’d like the opinions of @mitzimorris and @roualdes, who requested the feature originally.
Recently, for inclusion in CmdStanPy 1.2, I requested
draws_pd() include the columns
draw__, see cmdstanpy issue #676. It’s my understanding that CmdStanPy versions prior to 1.2 also included the columns
lp__, …, and
My thinking behind the inclusion of all columns, including diagnostic and chain related information, is that more information by default is better. This is especially true, in my opinion, since from a user’s perspective it’s harder to get the extra information contained in these columns than it is for a user to remove those columns, e.g.
df = [CmdStanMCMC].draws_pd()
df = df.drop(columns = df.filter(regex = ".*__").columns) # or add inplace = True
see pandas.DataFrame.drop() doc.
A more specific reason I filed issue #676 is that CmdStanR by default provides columns
draw__ into a draws dataframe, e.g.
[CmdStanMCMC]$draws(format = "df"). So I saw this as a step to better align CmdStan* interfaces.
If desired, @akcchoi, please open an issue on the CmdStanPy GitHub repository to discuss adding a flag, or some such option, to exclude the diagnostic and/or chain information.
Thanks. Will open an issue.