# PSA: Sufficient-Stats for Gaussian Likelihoods should speed up most Hierarchical models with IID observation subgroups

I previously posted on a speedup for Gaussian likelihoods that is akin to the sufficient-stats trick for binomial likelihoods, but I thought at that time that it was a trick that would rarely prove useful. Yesterday I had the realization that it might be worthwhile for common hierarchical scenarios, so I threw together a benchmark and indeed my hunch was correct:

Now, this speeds up only the likelihood part of a model, so don’t expect total run times of real-world models to achieve the above speedup factors, but it’s a pretty easy trick to apply & worth adding in most scenarios.

I’ll paste the Stan model used for the benchmark below but here are links to the Stan and R code for the benchmark:
main.r (2.4 KB)
plot.r (2.0 KB)
profiling_hierarchical_std_vs_suf.stan (3.4 KB)
sample_and_save.r (1.3 KB)

data{
// k: number of nominal levels of observation-groupings
int<lower=1> k ;
// n: number of observations made per k
int<lower=1> n ;
// y: matrix of k-by-n observations
array[n,k] real y ;
}
transformed data{
// to make for a comparison that is biased in favor of the standard/traditional
// approach, we'll flatten y to a vector for a single call to normal_lupdf.
vector[k*n] y_vec = to_vector(to_array_1d(y)) ;
// Vectorization of the standard approach requires use of a index variable
// that keeps track of which observation-group with which each observation
// is associated. This would be passed in by the user, but we'll create it
// here by first creating an 2D array akin to y but containing entries
// consisting of the column index repeated across rows, then flattening this
// array in the same way we flattened y:
array[n,k] int k_for_y ;
for(i_k in 1:k){
k_for_y[,i_k] = rep_array(i_k,n) ;
}
array[k*n] int k_for_y_vec = to_array_1d(k_for_y) ;
// The new "sufficient" approach requires collapsing the data y to a mean &
// variance per observation-group. To further bias the comparison in favor
// of the standard approach, we'll pretend the user is relating these
// summaries to the model parameters via an indexing array, which is useful
// in the context of more complex models & missing data. Here the indexing
// array will simply be of length k and consist of the sequence 1 through k.
vector[k] obs_mean ;
vector[k] obs_var ;
array[k] int one_thru_k ;
for(i_k in 1:k){
one_thru_k[i_k] = i_k ;
obs_mean[i_k] = mean(y[,i_k]) ;
obs_var[i_k] = variance(y[,i_k]) ;
}
// The new sufficient approach uses pre-computable quantities involving the
// number of observations contributing to each summary. While this is constant
// in this demo at n, we'll account for potential slowdowns associated with
// user data having a unique number of observations per summary by using a
// vector of values for each of the following quantities, repeating the
// quantity k times.
vector[k] sqrt_n = rep_vector(sqrt(n*1.0),k) ;
vector[k] sd_gamma_shape = rep_vector((n - 1.0) / 2.0,k) ;
}
parameters{
// plain ol' mean-and-sd model for each observation-group with no pooling
vector[k] pop_mean ;
vector<lower=0>[k] pop_sd ;
}
model{
// priors for the parameters:
pop_mean ~ std_normal() ;
pop_sd ~ weibull(2,1) ;
// so that we don't increment target twice, collect the log-liklihoods in
// two separate local variables inside profile statements to time each:
real target1 = 0 ;
real target2 = 0 ;
profile("std"){
target1 += normal_lupdf(y_vec |pop_mean[k_for_y_vec],pop_sd[k_for_y_vec]) ;
}
profile("suf"){
target2 += normal_lupdf( obs_mean | pop_mean[one_thru_k] , pop_sd[one_thru_k]./sqrt_n ) ;
target2 += gamma_lupdf(obs_var | sd_gamma_shape , sd_gamma_shape ./ pow(pop_sd[one_thru_k],2) ) ;
}
// only use the standard-computed log-likelihood to increment the target:
target += target1 ;
}

// If we wanted to check that target1 & target2 are precisely proportional, we
// could have the following GQ section recomputing them:

// generated quantities{
// 	real target1 = 0 ;
// 	real target2 = 0 ;
// 	target1 += normal_lpdf(y_vec |pop_mean[k_for_y_vec],pop_sd[k_for_y_vec]) ;
// 	target2 += normal_lpdf( obs_mean | pop_mean[one_thru_k] , pop_sd[one_thru_k]./sqrt_n ) ;
// 	target2 += gamma_lpdf(obs_var | sd_gamma_shape , sd_gamma_shape ./ pow(pop_sd[one_thru_k],2) ) ;
// }


3 Likes

Nice PSA! Sufficient statistics are a cool way to speed up likelihood computation. Indeed Gaussian linear regression also admits sufficient statistics. You can just precompute \hat{\beta} = (X^T X)^{-1} X^T y instead of doing the computation for each individual data point. Useful if you’re doing a hierarchical regression over different groups. See page 355 of BDA3 for the derivation.

3 Likes

Minor update: I worked out the appropriate jacobian correction necessary to use the chi-square rather than gamma distribution for the variance, which achieves a further 1.2x-1.3x speedup:

And here’s the code for the chi-square likelihood:

	target += (
normal_lupdf(
obs_mean
| pop_mean[one_thru_k]
, pop_sd[one_thru_k]./sqrt_n
)
+ (
(
chi_square_lupdf(
obs_var ./ pow(pop_sd[one_thru_k],2)
| 1
)
- 2*sum(log(pop_sd[one_thru_k]))
)
* k_minus_one
)
)  ;


1 Like