Optimizing sampling of a large Poisson model with a multivariate time series component

I’ve been working on a variation of a model that has been used to model the outcomes of games in some sports. The basic model starts like this:

g_h \sim \mathrm{Poisson}(\alpha_h \beta_a)\\ g_a \sim \mathrm{Poisson}(\alpha_a \beta_h)

Where g_h, g_a are the goals scored in a game by the home and the away team, respectively. The coefficients \alpha and \beta refer to the skill of the teams in offense and defense, respectively. Those parameters are estimated from the outcome of many games against different teams. A high \alpha means the team is strong on offense, whereas a low \beta indicates the team is a strong defender. All good so far.

I’ve made some adjustments to this, first I decompose the team level coefficients into the sums of coefficients of individual players, so:

\alpha_h = \sum_k \alpha_{h,k} \qquad \beta_h = \sum_k \beta_{h,k}

In my case, players play on different teams consistently so there’s interest in inference on a player by player basis. This obviously increases the number of parameters of the model a good bit, but no matter.

The second change I made is to model the parameters as a time series to infer changes in player performance/skill over time. My attempt is to model each player’s coefficients as follows:

\begin{pmatrix} \alpha_{t+1}\\ \beta_{t+1} \end{pmatrix} \sim \mathrm{Normal}\left( \begin{pmatrix} \alpha_t\\ \beta_t \end{pmatrix}, \begin{pmatrix} \sigma_\alpha & \rho \sigma_\alpha^2 \sigma_\beta\\ \rho \sigma_\alpha \sigma_\beta & \sigma_\beta^2 \end{pmatrix} \right)

This again increases the number of parameters as each player now has two parameters for each time

The end result of this is the model has gotten rather slow to sample with a few thousand games of data, a couple hundred players, and dozen or so time periods. So I’m curious if anyone could give me some strategies to optimize the code I’ve written to. I know things like vectorization and whatnot can help speed up these processes, but I’m still relatively new to writing larger models in Stan.

My full code is as follows:

data {
    int N; // num games
    int J; // num players
    int T; // time periods
    int<lower=1,upper=T> TT[N]; // time period index of game
    int X[N, 6]; // game player indices
    int G[N, 6]; // goals per player
transformed data {
    int GT[N, 2]; // goals per team
    for (n in 1:N) {
        GT[n,1] = sum(G[n,1:3]);
        GT[n,2] = sum(G[n,4:6]);
parameters {
    matrix<lower=0>[T, J] alpha; // attacking
    matrix<lower=0>[T, J] beta; // defending
    // time series params
    cholesky_factor_corr[2] L; // correlation
    vector<lower=0>[2] tau;  // variance
    // hyperparameters of player coefficients
    real<lower=0> alpha_a;
    real<lower=0> alpha_b;
    real<lower=0> beta_a;
    real<lower=0> beta_b;
transformed parameters {
    // These parameters select the correct player 
    // coefficients for each game, so we don't have to 
    // do it in the model step
    matrix<lower=0>[N, 2] alpha_teams;
    matrix<lower=0>[N, 2] beta_teams;
    for (n in 1:N) {
            int team_a[3] = X[n, 1:3];
            int team_b[3] = X[n, 4:6];
            alpha_teams[n,1] = sum(alpha[TT[n], team_a]);
            alpha_teams[n,2] = sum(alpha[TT[n], team_b]);
            beta_teams[n, 1] = sum(beta[TT[n], team_a]);
            beta_teams[n, 2] = sum(beta[TT[n], team_b]);
model {
    // priors 
    for (t in 1:T) {
        alpha[t] ~ gamma(alpha_a, alpha_b);
        beta[t] ~ gamma(beta_a, beta_b);
    alpha_a ~ cauchy(0, 2.5);
    alpha_b ~ cauchy(0, 2.5);
    beta_a ~ cauchy(0, 2.5);
    beta_b ~ cauchy(0, 2.5);
    tau ~ cauchy(0, 2.5);
    L ~ lkj_corr_cholesky(2);
    // time series
    for (t in 2:T) {
        for (j in 1:J) {
            [alpha[t,j], beta[t,j]] ~ multi_normal_cholesky([alpha[t-1,j], beta[t-1,j]], diag_pre_multiply(tau,L));

    // poisson model, the last bit excludes ties as they are not possible
    for (n in 1:N) {
        target += poisson_lpmf(GT[n,1] | alpha_teams[n,1]*beta_teams[n,2]) + 
                  poisson_lpmf(GT[n,2] | alpha_teams[n,2]*beta_teams[n,1]) -
                  log1m_exp(-(alpha_teams[n,1]*beta_teams[n,2] + alpha_teams[n,2]*beta_teams[n,1])*

Any help would be appreciated! I feel like there are a lot of opportunities for vectorization here, but I’m not quite savvy enough to figure it out.

Whelp, bigger issues – the model doesn’t quite seem to do what I had hoped. Seems like throwing the time series bit in there really threw the
Poisson estimates off quite a bit.

If anyone has any thoughts on how to better approach a model of this style, it’d be appreciated!

I have already encountered similar models for mortality forecasting, namely a count process where the inner parameters follow some time series.

In my experience, the slow sampling comes from the call to the multi_normal_cholesky. One way I found to tackle this is to use a well-known property of a bivariate normal to come back to independent normals (see Wikipedia):

Therefore, if you want to have something like:

You can write the following lines of code:

target += normal_lpdf(alpha[2:T] | alpha[1:(T- 1)], sigma[1]);
target += normal_lpdf(beta[2:T] | beta[1:(T- 1)] + rho * sigma[2]*(alpha[2:T]-alpha[1:(T- 1)])/ sigma[1],
sigma[2] * sqrt(1 - square(rho)));

Hence, you just need to call independent univariate normals, which is much faster than calling the multi_normal_cholesky function

Hope it helps!

1 Like

Clever! I like this and will definitely give it a shot.

You mention having seen similar models in mortality forecasting, would you happen to have any references/papers I could take a look at? I think it would help me conceptually to take a look at how they specify their models.

1 Like

Well, if I can promote my own research, you can have a look at the paper I wrote with my colleagues.

In this paper, you will see the priors and likelihood specifications, and some commonly used mortality models.

For the stan implementation, I worked on a R package implementing 5 mortality models in Stan:

In particular, two of them (CBD and M6 models) have multivariate random walks priors similar to your case.

1 Like

I’ll dig in, thank you for the help!

1 Like