Standard Error for least squares vs standard deviation of parameters from Stan


Least squares fit provides point estimates of parameters along with the covariance matrix over the parameters estimates. One can compute standard deviation on the individual parameters by taking a square root of the diagonal values of this matrix. One can use scipy’s curve_fit routine for this.

On the other hand, modeling the regression using a Bayesian approach outputs a distribution of parameters along with the mean, standard deviation and interval estimates (5%, 95%, 50% etc.).

Question - Can we compare the standard deviation of individual parameters from the curve_fit routine with the standard deviation obtained from the posterior distribution of parameters from Stan.


Formally, no in general ,as the two outputs have different mathematical interpretations, but yes in some special limits. Informally, however, they are often compared without considering the more technical details.

In particular, the people tend to intuitively think in terms of distributions over parameter space and so they actually misinterpret least squares fits as probabilistic results. From that perspective the Bayesian output is comparable to expected intuition but a much better characterization of the actual uncertainties in the fit.


Specifically, most maximum likelihood estimators (or least squares approximation thereof in non-normal cases) use the inverse of the Hessian (second derivative matrix) to estimate standard errors. You can then use that to get approximate confidence intervals. The interpretation is as usual for confidence intervals (variation in estimate due from sampling variation).

I think that’s just when the posterior is multivariate normal. Although that’s often true asymptotically, the whole reason we’re usually doing statistics is because we have a small sample of data with which we want to estimate (super)populations.