This is the key point. I would recommend simulating this to get a feel for what’s going on. Set up a simple regression model with simulated data. It can be as simple as estimating the mean of a normal distribution with a fixed standard deviation of 1.
data {
int<lower=0> N;
vector[N] y;
}
parameters {
real mu;
}
model {
y ~ normal(mu, 1);
}
From this, I simulated 10 data points using Python,
>>> y = np.random.randn(10)
>>> y
array([ 1.01965314, -1.88385894, 0.83243863, -1.07899283, -0.46363943,
-0.10507314, 0.84011055, 1.29692573, 0.44435409, -0.41459175])
and then fit the model using 100 iterations and printed the summary:
>>> f = m.sample(data = data, iter_sampling=100, chains=1)
>>> f.summary()
Mean MCSE StdDev
mu 0.099474 0.037207 0.251690
The MCSE is the uncertainty in our estimate of 0.09 for the mean. The StdDev is the posterior standard deviation on mu
. We see that if we go up to 10,000 sampling iterations, the MCSE goes down, but not the posterior StdDev:
Mean MCSE StdDev
mu 0.04563 0.004718 0.311458
We expect the MCSE to go down by a factor of sqrt(10000/100) = 10 when moving from 100 to 10000 draws (this is theory), which is about what we see.
Now compare to what happens when I go from 10 to 10_000 observations, first with 100 MCMC iterations,
Mean MCSE StdDev
mu -0.000081 0.001603 0.009995
and then with 10,000:
Mean MCSE StdDev
mu 0.001594 0.000167 0.010147
Here’s a quick summary table that I hope makes things clear.
data size | MCMC iterations | MC standard error | Posterior standard deviation |
---|---|---|---|
10 | 100 | 0.037 | 0.25 |
10 | 10,000 | 0.004 | 0.31 |
10,000 | 100 | 0.0016 | 0.010 |
10,000 | 10,000 | 0.00017 | 0.010 |
This is confusing because both give you uncertainty around mu
. The posterior standard deviation is your true, irreducible uncertainty derived from the model. The Monte Carlo standard error just gives you uncertainty in your estimate of the mean of mu
. Technically,
MCSE = StdDev / sqrt(N_eff)
where N_eff
is the effective sample size. We also expect the posterior standard deviation to shrink at a rate of 1 / sqrt(N)
, where N
is the data size.