PSA: Sufficient-Stats for Gaussian Likelihoods should speed up most Hierarchical models with IID observation subgroups

I previously posted on a speedup for Gaussian likelihoods that is akin to the sufficient-stats trick for binomial likelihoods, but I thought at that time that it was a trick that would rarely prove useful. Yesterday I had the realization that it might be worthwhile for common hierarchical scenarios, so I threw together a benchmark and indeed my hunch was correct:

Now, this speeds up only the likelihood part of a model, so don’t expect total run times of real-world models to achieve the above speedup factors, but it’s a pretty easy trick to apply & worth adding in most scenarios.

I’ll paste the Stan model used for the benchmark below but here are links to the Stan and R code for the benchmark:
main.r (2.4 KB)
plot.r (2.0 KB)
profiling_hierarchical_std_vs_suf.stan (3.4 KB)
sample_and_save.r (1.3 KB)

	// k: number of nominal levels of observation-groupings
	int<lower=1> k ;
	// n: number of observations made per k
	int<lower=1> n ;
	// y: matrix of k-by-n observations
	array[n,k] real y ;
transformed data{
	// to make for a comparison that is biased in favor of the standard/traditional
	// approach, we'll flatten y to a vector for a single call to normal_lupdf.
	vector[k*n] y_vec = to_vector(to_array_1d(y)) ;
	// Vectorization of the standard approach requires use of a index variable
	// that keeps track of which observation-group with which each observation
	// is associated. This would be passed in by the user, but we'll create it
	// here by first creating an 2D array akin to y but containing entries
	// consisting of the column index repeated across rows, then flattening this
	// array in the same way we flattened y:
	array[n,k] int k_for_y ;
	for(i_k in 1:k){
		k_for_y[,i_k] = rep_array(i_k,n) ;
	array[k*n] int k_for_y_vec = to_array_1d(k_for_y) ;
	// The new "sufficient" approach requires collapsing the data y to a mean &
	// variance per observation-group. To further bias the comparison in favor
	// of the standard approach, we'll pretend the user is relating these
	// summaries to the model parameters via an indexing array, which is useful
	// in the context of more complex models & missing data. Here the indexing
	// array will simply be of length k and consist of the sequence 1 through k.
	vector[k] obs_mean ;
	vector[k] obs_var ;
	array[k] int one_thru_k ;
	for(i_k in 1:k){
		one_thru_k[i_k] = i_k ;
		obs_mean[i_k] = mean(y[,i_k]) ;
		obs_var[i_k] = variance(y[,i_k]) ;
	// The new sufficient approach uses pre-computable quantities involving the
	// number of observations contributing to each summary. While this is constant
	// in this demo at n, we'll account for potential slowdowns associated with
	// user data having a unique number of observations per summary by using a
	// vector of values for each of the following quantities, repeating the
	// quantity k times.
	vector[k] sqrt_n = rep_vector(sqrt(n*1.0),k) ;
	vector[k] sd_gamma_shape = rep_vector((n - 1.0) / 2.0,k) ;
	// plain ol' mean-and-sd model for each observation-group with no pooling
	vector[k] pop_mean ;
	vector<lower=0>[k] pop_sd ;
	// priors for the parameters:
	pop_mean ~ std_normal() ;
	pop_sd ~ weibull(2,1) ;
	// so that we don't increment target twice, collect the log-liklihoods in
	// two separate local variables inside profile statements to time each:
	real target1 = 0 ;
	real target2 = 0 ;
		target1 += normal_lupdf(y_vec |pop_mean[k_for_y_vec],pop_sd[k_for_y_vec]) ;
		target2 += normal_lupdf( obs_mean | pop_mean[one_thru_k] , pop_sd[one_thru_k]./sqrt_n ) ;
		target2 += gamma_lupdf(obs_var | sd_gamma_shape , sd_gamma_shape ./ pow(pop_sd[one_thru_k],2) ) ;
	// only use the standard-computed log-likelihood to increment the target:
	target += target1 ;

// If we wanted to check that target1 & target2 are precisely proportional, we
// could have the following GQ section recomputing them:

// generated quantities{
// 	real target1 = 0 ;
// 	real target2 = 0 ;
// 	target1 += normal_lpdf(y_vec |pop_mean[k_for_y_vec],pop_sd[k_for_y_vec]) ;
// 	target2 += normal_lpdf( obs_mean | pop_mean[one_thru_k] , pop_sd[one_thru_k]./sqrt_n ) ;
// 	target2 += gamma_lpdf(obs_var | sd_gamma_shape , sd_gamma_shape ./ pow(pop_sd[one_thru_k],2) ) ;
// }


Nice PSA! Sufficient statistics are a cool way to speed up likelihood computation. Indeed Gaussian linear regression also admits sufficient statistics. You can just precompute \hat{\beta} = (X^T X)^{-1} X^T y instead of doing the computation for each individual data point. Useful if you’re doing a hierarchical regression over different groups. See page 355 of BDA3 for the derivation.


Minor update: I worked out the appropriate jacobian correction necessary to use the chi-square rather than gamma distribution for the variance, which achieves a further 1.2x-1.3x speedup:

And here’s the code for the chi-square likelihood:

	target += (
			| pop_mean[one_thru_k]
			, pop_sd[one_thru_k]./sqrt_n
		+ (
					obs_var ./ pow(pop_sd[one_thru_k],2)
					| 1
				- 2*sum(log(pop_sd[one_thru_k]))
			* k_minus_one
	)  ;

1 Like