New reduce_sum makes cross-validation simple; should we standardise?

wds15 · April 15, 2020, 10:00am

Hi!

The 2.23 release is bringing us reduce_sum which enables efficient within-chain parallelisation in Stan in a scalable way. I am relatively sure that this will change the way we write our models whenever these have to deal with larger amounts of data and amend to the reduce summation facility. Have a look at a simple example. I just turned a categorical_ logit model over to reduce_sum and the speedups are linear in the number of cores (90s runtime => 31s on 3 cores).

The key idea of reduce_sum is to ask users to write a function calculating partial sums of the log-likelihood. Say you have N data items for which the log-lik is calculated in independence for each data point. Then the user must provide a function which calculates partial sums out of the big reduce. The partial sum function simply sums the terms in the index range start to end and thus the original sum over all data items can be written as (using the example linked) as:

target += partial_sum(1, N, n_redcards, n_games, rating, beta); // Sum terms 1 to N in the likelihood

Having now such a function in many Stan models asks for exploring this further. The point here is that writing leave-something-out cross-validation now becomes a very simple matter of

target += partial_sum(1, i-1, n_redcards, n_games, rating, beta) +
 partial_sum(i+1, N, n_redcards, n_games, rating, beta);

which would leave out element i from the log likelihood.

Doing this to hierarchical models is easy as well - given you order your index space of the observations by the groupings you wish to leave out.

The reason I am bringing this up is that I would be very interested in seeing this being standardised in such a way so that our R packages (loo, rstanarm, brms) can take advantage of this. With reduce_sum we will have more structure in terms of evaluating the log-likelihood and standardising this may bring us very far, I would hope; maybe this is old news for many, but as we just created reduce_sum, I thought that the Pinocchio principle could come to life here.

Tagging @Stan_Development_Team to have a look.

Best,
Sebastian

Topic		Replies	Views
Parallelization in Stan General	6	506	October 24, 2020
Reduce_sum parallelisation issue Modeling cmdstanr , multivariate-normal	12	1034	February 24, 2022
RStan parallelising using reduce_sum() Modeling	1	447	July 30, 2021
Parallelisation suggestion Developers paralellization	11	577	April 22, 2020
Reduce_sum() no time saving for multilevel model Modeling techniques	1	454	December 24, 2020

New reduce_sum makes cross-validation simple; should we standardise?

Related topics