Multiple Runs of the Same Model Yield Different Estimates

Hi all,

If a model appears to have converged, why could running it multiple times give different results for the estimates?

Here’s a very simple toy example, as this seems to be an issue with all models I run:

library(brms)
data('iris')
brm(Sepal.Length~Sepal.Width,data = iris)
brm(Sepal.Length~Sepal.Width,data = iris)

First run on my machine results in estimates of (6.54, -0.23) for the coefficients, while the second run results in estimates of (6.52, -0.22). Rhat values are all 1.00, there are no warning messages, and examining the plot() method doesn’t seem to show any major signs that the model hasn’t converged.

OS is Windows, brms version 2.17, R version 4.2.1.

Because it’s sampling based estimation. For example, if I take 1000 draws from a standard normal distribution, I don’t expect them all to have a mean of 0.00000. Instead, it looks like this:

> for (n in 1:10) print(mean(rnorm(1000)))
[1] -0.008647283
[1] 0.004369155
[1] -0.02758793
[1] -0.07020262
[1] -0.03062315
[1] 0.02008949
[1] 0.009072343
[1] 0.04534396
[1] -0.02572225
[1] 0.02673742

And there’s not even convergence here—this is just 1000 independent draws.

What you want to look at in terms of the Stan output is the MCMC standard error. That tells you how accurate you can expect your mean estimates to be. It’s defined as the posterior standard deviation divided by the square root of the effective sample size. For example, in the 1000 draws from the standard normal, the sd is 1, and the ESS is 1000, so we expect errors distributed roughly as normal(0, 1/sqrt(1000)). For example,

> for (n in 1:10) print(rnorm(1, 0, 1/sqrt(1000)))
[1] 0.02822878
[1] 0.03832842
[1] 0.02723462
[1] 0.03650974
[1] -0.007586212
[1] -0.03132589
[1] -5.446237e-05
[1] -0.006155
[1] 0.04067246
[1] -0.04534127

With more draws, they get closer together. For instance, if I take 100,000 draws from a standard normal, the error is 1/10th what it is with 1000 draws,

> for (n in 1:10) print(mean(rnorm(1e5)))
[1] 0.005991892
[1] 0.003730479
[1] 0.003209092
[1] -0.006674568
[1] -0.0009652304
[1] 0.001986868
[1] 0.003473231
[1] 0.0003379722
[1] -0.002987224
[1] 0.001434846

Basically, you need a factor of N^2 as many draws to reduce error by a factor of 1/N.

Welcome to Monte Carlo-based estimation.

[edit: first used 1e6 instead of 1e5 in final example, so edited to match the text]

2 Likes

Makes sense, thanks. I understood that sampling based estimation led to estimates with some sampling error, but I didn’t think it would have such high magnitude in simple models, so I thought I might have been doing something else wrong. I’ll use that information to define an acceptable margin of error.

Thanks again!

Bob illustrated well the variation in Monte Carlo. I’m adding a link to a case study that shows how to compute Monte Carlo standard error and use that to estimate which digits in the estimate can change if rerunning MCMC, and discusses how many digits it is useful to report.

1 Like

An alternative strategy for reducing the monte carlo variance is to use control variates. We have put together a prototpye repo to use them with Stan. See here: https://github.com/zhouyifan233/pystan2-ControlVariates

Comments welcome (eg here or as “issues” on the repo). Assuming (?) the associated paper is published soon (or we put it on arxiv), we plan to put together a design doc but would welcome feedback before then.

1 Like

I get “404 Page not found”

Oops. Try this: GitHub - zhouyifan233/PyStan_control_variate: Implementation of control variate for pystan

1 Like