Cmdstanpy error in fit.summary()?

I have whittled a bug in my results down to this disturbing discrepancy between fit.summary() and fit.draws_pd().describe().

It seems the former is (a) rounding by significant figures , but also (b) making mistakes.

In particular, the problem that led me here is the value for StdDev of posterior_fraclow in fit.summary(). Please compare this with the true std given by pandas. Is there something I don’t understand about what fit.summary() should do?

ipdb>  fit.summary(percentiles=self.percentiles)
                                Mean      MCSE  StdDev       2.5%        25%        50%        75%      97.5%   N_Eff  N_Eff/s  R_hat
name                                                                                                                                 
lp__                      -9700.0000  0.120000  3.1000 -9700.0000 -9700.0000 -9700.0000 -9700.0000 -9700.0000   640.0   0.0270    1.0
betaN[1]                      2.1000  0.015000  0.3500     1.4000     1.9000     2.1000     2.4000     2.8000   580.0   0.0240    1.0
betaN[2]                      0.0610  0.002000  0.0700    -0.0760     0.0140     0.0590     0.1100     0.2000  1209.0   0.0500    1.0
betaN[3]                      0.3300  0.005900  0.1400     0.0910     0.2300     0.3200     0.4200     0.6300   579.0   0.0240    1.0
betaS[1]                      0.1700  0.000450  0.0270     0.1200     0.1500     0.1700     0.1900     0.2200  3571.0   0.1500    1.0
betaS[2]                      0.2600  0.000780  0.0320     0.2000     0.2400     0.2600     0.2800     0.3200  1638.0   0.0680    1.0
cH[1]                        -7.4000  0.120000  1.9000   -12.0000    -8.3000    -6.9000    -6.0000    -5.1000   254.0   0.0110    1.0
cH[2]                        -4.7000  0.010000  0.2300    -5.1000    -4.9000    -4.7000    -4.5000    -4.2000   486.0   0.0200    1.0
cH[3]                        -3.8000  0.005300  0.1300    -4.0000    -3.9000    -3.8000    -3.7000    -3.5000   607.0   0.0250    1.0
cH[4]                        -3.0000  0.003300  0.0880    -3.2000    -3.1000    -3.0000    -2.9000    -2.8000   713.0   0.0300    1.0
cH[5]                        -2.3000  0.002100  0.0700    -2.4000    -2.4000    -2.3000    -2.3000    -2.2000  1088.0   0.0450    1.0
cH[6]                        -1.6000  0.007300  0.1600    -2.1000    -1.7000    -1.6000    -1.5000    -1.4000   486.0   0.0200    1.0
cH[7]                        -1.0000  0.004100  0.0970    -1.3000    -1.1000    -0.9900    -0.9400    -0.8500   548.0   0.0230    1.0
cH[8]                         0.0078  0.002400  0.0620    -0.1200    -0.0350     0.0070     0.0480     0.1300   656.0   0.0270    1.0
cH[9]                         1.6000  0.004100  0.1100     1.4000     1.5000     1.5000     1.6000     1.8000   697.0   0.0290    1.0
cH[10]                        4.0000  0.062000  1.5000     2.7000     3.1000     3.5000     4.2000     8.1000   604.0   0.0250    1.0
cL[1]                        -3.5000  0.073000  1.0000    -6.5000    -3.7000    -3.3000    -3.0000    -2.6000   200.0   0.0083    1.0
cL[2]                        -0.9400  0.033000  0.7100    -2.8000    -1.2000    -0.7900    -0.4600    -0.0025   453.0   0.0190    1.0
posterior_hist_high[1]        0.0015  0.000080  0.0017     0.0000     0.0002     0.0008     0.0024     0.0060   469.0   0.0200    1.0
posterior_hist_high[2]        0.0071  0.000026  0.0017     0.0040     0.0060     0.0070     0.0082     0.0110  4288.0   0.1800    1.0
posterior_hist_high[3]        0.0120  0.000037  0.0022     0.0078     0.0100     0.0120     0.0130     0.0160  3424.0   0.1400    1.0
posterior_hist_high[4]        0.0230  0.000046  0.0030     0.0180     0.0210     0.0230     0.0250     0.0290  4196.0   0.1700    1.0
posterior_hist_high[5]        0.0390  0.000075  0.0039     0.0310     0.0360     0.0390     0.0420     0.0460  2667.0   0.1100    1.0
posterior_hist_high[6]        0.0670  0.001000  0.0230     0.0060     0.0550     0.0700     0.0840     0.1000   513.0   0.0210    1.0
posterior_hist_high[7]        0.0900  0.000090  0.0057     0.0790     0.0860     0.0900     0.0940     0.1000  4015.0   0.1700    1.0
posterior_hist_high[8]        0.2000  0.000120  0.0079     0.1900     0.2000     0.2000     0.2100     0.2200  4401.0   0.1800    1.0
posterior_hist_high[9]        0.2800  0.000160  0.0089     0.2600     0.2800     0.2800     0.2900     0.3000  3164.0   0.1300    1.0
posterior_hist_high[10]       0.1300  0.000110  0.0066     0.1200     0.1300     0.1300     0.1400     0.1500  3563.0   0.1500    1.0
posterior_hist_high[11]       0.0280  0.000700  0.0180     0.0000     0.0130     0.0270     0.0410     0.0590   631.0   0.0260    1.0
posterior_hist_low[1]         0.0045  0.000073  0.0020     0.0002     0.0032     0.0046     0.0058     0.0082   768.0   0.0320    1.0
posterior_hist_low[2]         0.0350  0.001000  0.0220     0.0010     0.0190     0.0320     0.0480     0.0900   497.0   0.0210    1.0
posterior_hist_low[3]         0.0760  0.000740  0.0180     0.0420     0.0620     0.0760     0.0910     0.1100   600.0   0.0250    1.0
posterior_fraclow             0.1000  0.000000  0.0000     0.1000     0.1000     0.1000     0.1000     0.2000   535.9   0.0000    1.0
posterior_mean_high           7.1000  0.000000  0.1000     6.9000     7.0000     7.1000     7.2000     7.2000   773.1   0.0000    1.0
posterior_mean_low            8.2000  0.000000  0.6000     7.1000     7.7000     8.1000     8.6000     9.4000   509.7   0.0000    1.0
posterior_hist_latent[1]      0.0017  0.000090  0.0020     0.0000     0.0002     0.0010     0.0026     0.0068   466.0   0.0190    1.0
posterior_hist_latent[2]      0.0082  0.000032  0.0019     0.0046     0.0068     0.0080     0.0094     0.0120  3649.0   0.1500    1.0
posterior_hist_latent[3]      0.0140  0.000046  0.0025     0.0088     0.0120     0.0140     0.0150     0.0190  2974.0   0.1200    1.0
posterior_hist_latent[4]      0.0270  0.000063  0.0035     0.0200     0.0240     0.0260     0.0290     0.0330  3101.0   0.1300    1.0
posterior_hist_latent[5]      0.0450  0.000100  0.0045     0.0360     0.0420     0.0450     0.0480     0.0530  1961.0   0.0820    1.0
posterior_hist_latent[6]      0.0760  0.001100  0.0240     0.0072     0.0640     0.0800     0.0930     0.1100   505.0   0.0210    1.0
posterior_hist_latent[7]      0.1000  0.000210  0.0074     0.0880     0.0980     0.1000     0.1100     0.1200  1252.0   0.0520    1.0
posterior_hist_latent[8]      0.2300  0.000410  0.0120     0.2100     0.2200     0.2300     0.2400     0.2500   885.0   0.0370    1.0
posterior_hist_latent[9]      0.3200  0.000580  0.0150     0.2900     0.3100     0.3200     0.3300     0.3500   671.0   0.0280    1.0
posterior_hist_latent[10]     0.1500  0.000270  0.0092     0.1300     0.1400     0.1500     0.1600     0.1700  1184.0   0.0490    1.0
posterior_hist_latent[11]     0.0300  0.000750  0.0190     0.0000     0.0150     0.0310     0.0450     0.0630   633.0   0.0260    1.0
posterior_mean_latent         7.1000  0.000000  0.1000     6.9000     7.0000     7.1000     7.1000     7.2000   781.2   0.0000    1.0



ipdb>  fit.draws_pd().describe().T
                            count         mean         std           min          25%          50%          75%          max
lp__                       4000.0 -9737.438482    3.110194 -9.753210e+03 -9739.410000 -9737.140000 -9735.110000 -9730.570000
accept_stat__              4000.0     0.886251    0.201089  1.733110e-07     0.883991     0.962808     0.992227     1.000000
stepsize__                 4000.0     0.012548    0.001323  1.083700e-02     0.011523     0.012584     0.013609     0.014186
treedepth__                4000.0     7.862250    0.778091  2.000000e+00     8.000000     8.000000     8.000000    10.000000
n_leapfrog__               4000.0   328.577500  168.906977  6.000000e+00   255.000000   255.000000   511.000000  1023.000000
divergent__                4000.0     0.019500    0.138291  0.000000e+00     0.000000     0.000000     0.000000     1.000000
energy__                   4000.0  9745.921030    4.286429  9.734660e+03  9742.890000  9745.610000  9748.580000  9763.430000
betaN[1]                   4000.0     2.132952    0.354704  1.256090e+00     1.876807     2.126305     2.373512     3.536540
betaN[2]                   4000.0     0.061456    0.070473 -2.137460e-01     0.013832     0.058983     0.107139     0.363367
betaN[3]                   4000.0     0.334623    0.141327  8.126970e-03     0.233353     0.318071     0.423662     0.908631
betaS[1]                   4000.0     0.172014    0.026604  6.404760e-02     0.154348     0.172261     0.189735     0.265383
betaS[2]                   4000.0     0.263793    0.031640  1.499710e-01     0.242053     0.263876     0.284507     0.365405
cH[1]                      4000.0    -7.389239    1.858605 -1.330360e+01    -8.344217    -6.892330    -5.986508    -4.759050
cH[2]                      4000.0    -4.688734    0.231254 -5.425980e+00    -4.851625    -4.695135    -4.533048    -3.912510
cH[3]                      4000.0    -3.786661    0.129760 -4.255640e+00    -3.878600    -3.790660    -3.698860    -3.337060
cH[4]                      4000.0    -2.999632    0.088324 -3.287630e+00    -3.060935    -3.002090    -2.939213    -2.684350
cH[5]                      4000.0    -2.307452    0.069657 -2.569810e+00    -2.355427    -2.311110    -2.262328    -2.063150
cH[6]                      4000.0    -1.626288    0.160516 -2.294240e+00    -1.699467    -1.596975    -1.519730    -1.282450
cH[7]                      4000.0    -1.004561    0.097060 -1.371650e+00    -1.058200    -0.994765    -0.937240    -0.717693
cH[8]                      4000.0     0.007829    0.061590 -2.003920e-01    -0.034750     0.006998     0.047626     0.233727
cH[9]                      4000.0     1.550612    0.108371  1.269790e+00     1.468953     1.542665     1.628295     1.856150
cH[10]                     4000.0     4.007934    1.520068  2.474910e+00     3.115775     3.504180     4.243977    15.156800
cL[1]                      4000.0    -3.532115    1.038526 -1.296720e+01    -3.686590    -3.324145    -3.031042    -1.998350
cL[2]                      4000.0    -0.937627    0.711832 -7.596520e+00    -1.238128    -0.790392    -0.459250     0.385840
posterior_hist_high[1]     4000.0     0.001525    0.001722  0.000000e+00     0.000200     0.000800     0.002400     0.008602
posterior_hist_high[2]     4000.0     0.007134    0.001707  2.200440e-03     0.006001     0.007001     0.008202     0.013403
posterior_hist_high[3]     4000.0     0.011988    0.002193  6.001200e-03     0.010402     0.011802     0.013403     0.022604
posterior_hist_high[4]     4000.0     0.023194    0.002966  1.460290e-02     0.021204     0.023205     0.025005     0.034607
posterior_hist_high[5]     4000.0     0.039038    0.003879  2.560510e-02     0.036407     0.039008     0.041608     0.053611
posterior_hist_high[6]     4000.0     0.067493    0.022675  0.000000e+00     0.055211     0.070414     0.083617     0.119224
posterior_hist_high[7]     4000.0     0.090157    0.005699  7.121420e-02     0.086217     0.090018     0.093819     0.110022
posterior_hist_high[8]     4000.0     0.200940    0.007854  1.740350e-01     0.195589     0.201040     0.206241     0.229046
posterior_hist_high[9]     4000.0     0.281001    0.008902  2.492500e-01     0.275055     0.280856     0.287057     0.312062
posterior_hist_high[10]    4000.0     0.134251    0.006627  1.118220e-01     0.129626     0.134227     0.138628     0.165033
posterior_hist_high[11]    4000.0     0.027630    0.017556  0.000000e+00     0.013003     0.027405     0.041008     0.076415
posterior_hist_low[1]      4000.0     0.004462    0.002024  0.000000e+00     0.003201     0.004601     0.005801     0.011802
posterior_hist_low[2]      4000.0     0.035232    0.022478  0.000000e+00     0.018604     0.032206     0.047609     0.110422
posterior_hist_low[3]      4000.0     0.075953    0.018173  2.520500e-02     0.062012     0.075615     0.090668     0.121024
posterior_fraclow          4000.0     0.115647    0.032954  3.660730e-02     0.091418     0.111622     0.136277     0.217443
posterior_mean_high        4000.0     7.095085    0.076369  6.839950e+00     7.042810     7.099115     7.150103     7.329180
posterior_mean_low         4000.0     8.163858    0.616248  6.639780e+00     7.703155     8.085820     8.561182    10.000000
posterior_hist_latent[1]   4000.0     0.001740    0.001952  0.000000e+00     0.000200     0.001000     0.002601     0.010802
posterior_hist_latent[2]   4000.0     0.008179    0.001928  2.600520e-03     0.006801     0.008002     0.009402     0.017003
posterior_hist_latent[3]   4000.0     0.013725    0.002497  6.801360e-03     0.012002     0.013603     0.015403     0.025205
posterior_hist_latent[4]   4000.0     0.026564    0.003484  1.620320e-02     0.024205     0.026405     0.028806     0.040408
posterior_hist_latent[5]   4000.0     0.044680    0.004514  3.040610e-02     0.041608     0.044609     0.047609     0.063013
posterior_hist_latent[6]   4000.0     0.076277    0.024167  0.000000e+00     0.063813     0.079616     0.093419     0.129826
posterior_hist_latent[7]   4000.0     0.102754    0.007400  7.901580e-02     0.097619     0.102420     0.107421     0.130226
posterior_hist_latent[8]   4000.0     0.228097    0.012109  1.884380e-01     0.219644     0.227245     0.235647     0.278456
posterior_hist_latent[9]   4000.0     0.316985    0.015136  2.734550e-01     0.306261     0.316063     0.326465     0.374075
posterior_hist_latent[10]  4000.0     0.150628    0.009166  1.196240e-01     0.144229     0.150230     0.156431     0.181836
posterior_hist_latent[11]  4000.0     0.030371    0.018747  0.000000e+00     0.015003     0.030606     0.045009     0.081216
posterior_mean_latent      4000.0     7.083868    0.073540  6.839770e+00     7.034760     7.088220     7.136230     7.311260

Here’s the distribution of samples for posterior_fraclow:

tmp

How does these results compare to arviz.summary?

az.summary() agrees with the second, and disagrees with fit.summary(), although az’s estimates are also rounded somewhat.

az.summary(fit).loc['posterior_fraclow']
mean           0.116
sd             0.033
hdi_3%         0.055
hdi_97%        0.177
mcse_mean      0.001
mcse_sd        0.001
ess_bulk     588.000
ess_tail     443.000
r_hat          1.010
Name: posterior_fraclow, dtype: float64

For my own ref: 6742956948

fit.summary() call CmdStan’s stansummary utility. the stansummary utility produces another CSV file of summary statics. CmdStanPy reads the results into a pandas dataframe.

to get more precision, you can use the sig_figs argument.

the values reported by draws_pd will have as much precision as the output Stan CSV files - cf the CmdStan manual: 8 Command-Line Interface Overview | CmdStan User’s Guide. Therefore you can set the sig_figs argument when you run the sampler and you will get more precision - but your MCSE Monte Carlo Standard Error gives you a bounds on how much of that is useful.

Estimation by sampling produces an approximate value for the model parameters; the MCSE statistic indicates the amount of noise in the estimate. Therefore MCSE column is placed next to the sample mean column, in order to make it easy to compare this sample with others.

Thank you. Putting a high enough sig_figs argument gets around the problem, but by default it is not giving any consistent number of sig figs. It is, as I have shown, giving zero sig figs for some values – including MCSE!!

Had it been a consistent but too-low number of sig_figs, I would even thought to have looked for a sig_figs argument. Instead, it’s giving subtly terrible wrong values by default. Would you agree?

see edited comment above

Is there something I missed in your edit?

Actually, there is also a difference in definitions of “significant figures” here between what I understand (see Significant figures - Wikipedia) and what seems to be implemented. Look at this:


ipdb> fit.summary(sig_figs=5, percentiles=self.percentiles).loc['posterior_fraclow']                                                                                                                                                                                                                                                                            
Mean         0.1156                                                                                                                                                                                                                                                                                                                                             
MCSE         0.0014                                                                                                                                                                                                                                                                                                                                             
StdDev       0.0330                                                                                                                                                                                                                                                                                                                                             
2.5%         0.0612                                                                                                                                                                                                                                                                                                                                             
25%          0.0914                                                                                                                                                                                                                                                                                                                                             
50%          0.1116                                                                                                                                                                                                                                                                                                                                             
75%          0.1364                                                                                                                                                                                                                                                                                                                                             
97.5%        0.1904                                                                                                                                                                                                                                                                                                                                             
N_Eff      535.8536                                                                                                                                                                                                                                                                                                                                             
N_Eff/s      0.0223                                                                                                                                                                                                                                                                                                                                             
R_hat        1.0110                                                                                                                                                                                                                                                                                                                                             
Name: posterior_fraclow, dtype: float64                                                                                                                                                                                                                                                                                                                         
ipdb> fit.summary(sig_figs=None, percentiles=self.percentiles).loc['posterior_fraclow']                                                                                                                                                                                                                                                                         
Mean         0.1                                                                                                                                                                                                                                                                                                                                                
MCSE         0.0                                                                                                                                                                                                                                                                                                                                                
StdDev       0.0                                                                                                                                                                                                                                                                                                                                                
2.5%         0.1
25%          0.1
50%          0.1
75%          0.1
97.5%        0.2
N_Eff      535.9
N_Eff/s      0.0
R_hat        1.0
Name: posterior_fraclow, dtype: float64

That looks like a number of decimal places, not a number of significant figures, according to the only definitions I’ve known for decades.

I’m in a state of disbelief right now, so I will leave this up to others to decide if there’s a (high priority!) bug here to report to cmdstan folk.

if you think this is an error, it’s a problem with the underlying Stan library - stan/chains.hpp at develop · stan-dev/stan · GitHub
please file an issue on the Stan repo accordingly.

I’m just a naive user and a very naive statistician. Do you not think this is an error?

Clarification: I’m not likely to report something that seems like such a major error just based on my own assessment, since I believe many people who know more than me use this software and have been doing so for a long time. Clearly I must be doing (or thinking) something different from everyone else.

I believe many people who know more than me use this software and have been doing so for a long time. Clearly I must be doing (or thinking) something different from everyone else.

much of the logic in the C++ code was re-implemented directly in R to reflect changes to the theory around the R-hat statistics, and therefore most statisticians wouldn’t have run into this much.

as a C++ and Python developer, this is a concern and definitely on my radar. as suggested above, if you can use ArviZ, please do so.

The logic in CmdStanPy is here: cmdstanpy/mcmc.py at 6edb7ade2ff1eb44129f2220ecaf01b792de4a51 · stan-dev/cmdstanpy · GitHub

What’s happening is that if the sig_figs arg isn’t specified, then stansummary is called with sig_figs=2, which is also the default for CmdStan. taking a closer look at the C++, it would seem that this has the very bad effect of clobbering the precision of the computed estimates,
resulting in numbers like this:

"lp__",-66,0.024,0.98,-68,-66,-65,1.7e+03,3.3e+03,1
"accept_stat__",0.92,0.0016,0.1,0.71,0.96,1,4.1e+03,8e+03,1
"stepsize__",0.82,0.04,0.057,0.73,0.86,0.88,2,3.9,2e+13
"treedepth__",2,0.037,0.58,1,2,3,2.5e+02,4.8e+02,1
"n_leapfrog__",4,0.25,2,1,3,7,62,1.2e+02,1

The problem is the use of scientific notation used by the stansummary save to CSV file outputs - that’s where the precision is being lost. Maybe CmdStanPy is the only place stansummary save to CSV file is being used, which would explain why you noticed it first.

filed an issue: CmdStanMCMC `summary` method - don't specify `sig_figs` unless use does · Issue #550 · stan-dev/cmdstanpy · GitHub

The documentation on stansummary is wrong. The default number of significant figures is 2 for the display on the console, but for the csv file it is six:

Cmdstanpy, based on the assumption the doc is correct, defaults to 2 when nothing is supplied, but it should default to 6, and the significant figures argument should be updated to explain this.

Also, @cpbl, I believe you are correct, this is abuse of the terminology “significant figures”. The number is actually used as defined in std::precision ios_base::precision - C++ Reference and should probably not be called what it is

1 Like

Thanks, @cpbl. This is enough of an error report for us to work with and we appreciate it. I consider this a major bug to be displaying 0.0000 when the true answer is 0.033.

I think what you’ll find with open source is that users are the ones who tend to find these edge case bugs. The cost on our side is the same for responding to a bug report through GitHub as responding to the mailing list, so please use whichever mode you’re most comfortable with.

Could I ask if you could share a CSV file of draws that had this bad behavior? I may be missing something, but I don’t see anything in the comments from @WardBrian or @mitzimorris that would explain why you got 0.00 for the sd of posterior_fraclow rather than 0.033, which should be the answer with two digits of precision.

There are a lot of moving parts here getting results through to cmdstanpy, starting from cmdstan’s input of the CSV output from sampling, its analysis of the summary statistics, its output to csv, cmdstanpy’s ingestion of the csv, to storing in pandas, to final print output using pandas precision controls. I’m afraid the display problem isn’t really solvable, because there’s no precision that makes sense for any given column or for any given row. Nevertheless, if we’re going to report 5 digits of precision, as in 0.0000 and they’re all zeros, then the result better be smaller than 0.00005.

the problem for CmdStanPy is here: cmdstanpy/mcmc.py at 6edb7ade2ff1eb44129f2220ecaf01b792de4a51 · stan-dev/cmdstanpy · GitHub

so yes, the docs are wrong and we need to get rid of default 2.

Thanks, everyone.
@Bob_Carpenter : I’ve been using exclusively an open source OS since I stopped using commercial POSIX two decades ago, and have submitted many bugs for various OSS as a user. I was simply asking for my correspondent, or someone, to agree that there was a problem before I put more time in. (I’ve really gone on a poor track with Stan by choosing python. I’ve now lost weeks (?!) of work to the pystan2 to pystan3 conversion and then giving up on both for cmdstanpy, assuming maybe it woudl be more stable (ie unchanging) than pystan had been, and somehow more robust, and now finding what is apparently an “edge case” in my first results using the latter. For other Python users approaching Stan, I would recommend learning R.) This is not a complaint; it is one person’s inference about where the largest user pool is, which is where OSS users want to place themselves.

I would like to make sure that my experience gets properly understood as three problems:

  1. incorrect use or application of the terminology “significant figures” (changing the behaviour to true sig figs would be better than changing the language to “decimal places precision”, but either is okay)

  2. Not only is it not sig figs, it’s not decimal precision either. 0.000 is not a 2-decimal place approx of 0.033.

  3. If the default precision is two decimal places, it’s insufficient (a poor default)

Yes @Bob_Carpenter I will supply some data if that’s still useful.

Thank you!!
c

1 Like

I do apologize for my initial misnderstanding of the issue that you reported.

we would very much like to figure out # 2.

problem 3 - setting defaults is tricky. Andrew Gelman recommends no more than 2 decimal places in most cases, but as you have demonstrated, this can be problematic.

1 Like

@mitzimorris Thanks.

In my view, # 1 is just as important, since it may be a cause of as much of my lost time as # 2.

for # 3: Really does Andrew recommend the same decimal precision for a mean as a sd? Computational issues are far out of my expertise, but it’s hard to imagine how sig figs are not always a better rule than decimal precision.

@Bob_Carpenter I’m attaching the list of samples of the relevant column (tab separated). I’m not certain whether that’s all you wanted.?
tmp-stan-samples.csv (54.5 KB)

thank you,
Chris

Is this logged as a bug somewhere, or is there likely to have been progress on it? The conversation above is 8 months old.
I’m finding another time-sucking weirdness using CmdStanModel.generate_quantities.draws_pd() … maybe I am supposed to be avoiding that too?