PSA: using stantargets for SBC(-esque) checks, & big speedups for big gaussian likelihoods

So my recent playing with alternatives to the typical multivariate normal led me to realize that there is a trick for more efficiently incrementing the target for Gaussian outcomes that is akin to what the SUG describes as “sufficient statistics” in the case of bernoulli outcomes.

For the Gaussian case, a minimal model would be:

data{
	int<lower=1> n ;
	vector[n] y ;
}
parameters{
	real pop_mean ;
	real<lower=0> pop_sd ;
}
model{
	pop_mean ~ std_normal() ;
	pop_sd ~ weibull(2,1) ;
	y ~ normal(pop_mean,pop_sd) ;
}

Where the final line involves the computation of the log-prob for each of the n value in y.

But we can reduce that to just two log-prob evaluations by pre-computing the mean and variance of y and then using the sampling distributions of these properties. Most folks are familiar with the normal sampling distribution of the mean, and @vients educated me on the sampling distribution of the variance here a while back. With the same data and parameters sections as the above, a model implementing this idea is:

...
transformed data{
	real obs_mean = mean(y) ;
	real obs_var = variance(y) ;
	real sqrt_n = sqrt(n) ;
	real gamma_shape = (n - 1) / 2 ;
}
...
model{
	pop_mean ~ std_normal() ;
	pop_sd ~ weibull(2,1) ; //peaked ~.8, zero-at-zero. 
	obs_mean ~ normal( pop_mean , pop_sd / sqrt_n ) ;
	obs_var  ~ gamma( gamma_shape , gamma_shape / pow(pop_sd,2) ) ;
}

Now, while the above made sense to me mathematically, I don’t have a strong background in formal maths, so I wanted to check that it was truly going to yield equivalent behaviour (presumably “up to a constant”), so I turned to the super cool Simulation-Based Calibration idea, which I in turn used the nifty stantargets package to implement quickly. I’d recently transitioned a bunch of my work at my day-job to using the awesome targets framework and thought stantargets looked super promising as a way to quickly implement SBC-like stuff.

In addition to the above-noted uncertainty as to whether I had the math right, I also suspected that I might have to explore extreme data scenarios to actually observe much compute-speed benefits from this trick, as I guessed that the standard normal computations are fairly well-optimized and thought that the gamma might not be (plus there’s an exponentiation and two divisions not present in the standard model). So I used stantargets to iterate over a logarithmic range of data sizes, for each generating 1000 data sets with true parameters sampled from the priors of the models above, and for each simulated data set I sampled both the “typical” and the new “sufficient” models, yielding posteriors for both the pop_mean and pop_sd parameters.

Proper/full SBC would then, for each data set and parameter, get the rank of the true parameter in the resulting posterior; the resulting ranks should, across datasets, be uniformly distributed if the model is well-calibrated. stantargets doesn’t have this functionality yet, but a rough approximation to this best-practice is to instead, for each data set and parameter, check whether the posterior central X% interval captures (“recovers”) the true parameter value, in which case a well-calibrated model should yield, across datasets, X% recovery rate.

For this exploration I somewhat arbitrarily choose to look at the 50% interval and the 80% intervals, and as you can see, both models seem well-calibrated in the sense that there’s no clear systematic deviations from expectations:
recovery
(dark-grey bands show 89%ile bounds for expectations at each interval level)

More excitingly (though qualified, see below), for larger data sizes, there’s a pretty dramatic speed improvement for the new approach (n.b. both axes on log-scale):
timing
(ribbons show the 25%ile-to-75%ile of timings)

But note that I think that it’s going to be pretty rare for a real-world dataset to be of a scale to need and benefit from these performance improvements. It really only applies to cases where you can express an estimate for a single combination of mean and variance for a set of otherwise IID samples, and I think it’ll be rare that any real-world quantity could be measured 100k+ times in a truly IID manner, let alone that folks with such data would care that it takes 10s rather than the more usual <1s. But I could be limited in my ability to imagine applications for this, and besides I thought it might be a helpful case-study of sorts on SBC(-like) model checking and using stantargets.

I’ll attach the stan and R files below. Note that they expect a folder structure as:

.
├── _imports.r
├── main.r
├── r_helpers
│   ├── generate_data.r
│   └── install_if_missing.r
├── _.rproj
├── stan_code
│   ├── sufficient.stan
│   └── typical.stan
└── _targets.R

install_if_missing.r (1011 Bytes)
generate_data.r (204 Bytes)
_imports.r (828 Bytes)
_targets.R (952 Bytes)
main.r (2.9 KB)
typical.stan (327 Bytes)
sufficient.stan (421 Bytes)

10 Likes

Huh. While I previously dismissed this Gaussian Sufficient Stats trick as unlikely to be useful given it requires lots of IID data, it just occurred to me that any model with lots of data and a normal likelihood can make the data IID by subtracting the parameter reflecting the mean then dividing by the parameter reflecting standard deviation. I think there’s probably a jacobian needed thanks to those transforms that I’d need to work out…