Optimizing sampling of a large Poisson model with a multivariate time series component

amas0 · April 16, 2021, 1:11am

I’ve been working on a variation of a model that has been used to model the outcomes of games in some sports. The basic model starts like this:

g_h \sim \mathrm{Poisson}(\alpha_h \beta_a)\\ g_a \sim \mathrm{Poisson}(\alpha_a \beta_h)

Where g_h, g_a are the goals scored in a game by the home and the away team, respectively. The coefficients \alpha and \beta refer to the skill of the teams in offense and defense, respectively. Those parameters are estimated from the outcome of many games against different teams. A high \alpha means the team is strong on offense, whereas a low \beta indicates the team is a strong defender. All good so far.

I’ve made some adjustments to this, first I decompose the team level coefficients into the sums of coefficients of individual players, so:

\alpha_h = \sum_k \alpha_{h,k} \qquad \beta_h = \sum_k \beta_{h,k}

In my case, players play on different teams consistently so there’s interest in inference on a player by player basis. This obviously increases the number of parameters of the model a good bit, but no matter.

The second change I made is to model the parameters as a time series to infer changes in player performance/skill over time. My attempt is to model each player’s coefficients as follows:

\begin{pmatrix} \alpha_{t+1}\\ \beta_{t+1} \end{pmatrix} \sim \mathrm{Normal}\left( \begin{pmatrix} \alpha_t\\ \beta_t \end{pmatrix}, \begin{pmatrix} \sigma_\alpha & \rho \sigma_\alpha^2 \sigma_\beta\\ \rho \sigma_\alpha \sigma_\beta & \sigma_\beta^2 \end{pmatrix} \right)

This again increases the number of parameters as each player now has two parameters for each time
period.

The end result of this is the model has gotten rather slow to sample with a few thousand games of data, a couple hundred players, and dozen or so time periods. So I’m curious if anyone could give me some strategies to optimize the code I’ve written to. I know things like vectorization and whatnot can help speed up these processes, but I’m still relatively new to writing larger models in Stan.

My full code is as follows:

data {
    int N; // num games
    int J; // num players
    int T; // time periods
    int<lower=1,upper=T> TT[N]; // time period index of game
    int X[N, 6]; // game player indices
    int G[N, 6]; // goals per player
}
transformed data {
    int GT[N, 2]; // goals per team
    
    for (n in 1:N) {
        GT[n,1] = sum(G[n,1:3]);
        GT[n,2] = sum(G[n,4:6]);
    }
}
parameters {
    matrix<lower=0>[T, J] alpha; // attacking
    matrix<lower=0>[T, J] beta; // defending
    
    // time series params
    cholesky_factor_corr[2] L; // correlation
    vector<lower=0>[2] tau;  // variance
    
    // hyperparameters of player coefficients
    real<lower=0> alpha_a;
    real<lower=0> alpha_b;
    real<lower=0> beta_a;
    real<lower=0> beta_b;
}
transformed parameters {
    // These parameters select the correct player 
    // coefficients for each game, so we don't have to 
    // do it in the model step
    matrix<lower=0>[N, 2] alpha_teams;
    matrix<lower=0>[N, 2] beta_teams;
    
    for (n in 1:N) {
        {
            int team_a[3] = X[n, 1:3];
            int team_b[3] = X[n, 4:6];
            alpha_teams[n,1] = sum(alpha[TT[n], team_a]);
            alpha_teams[n,2] = sum(alpha[TT[n], team_b]);
            beta_teams[n, 1] = sum(beta[TT[n], team_a]);
            beta_teams[n, 2] = sum(beta[TT[n], team_b]);
        }
    }
}
model {
    // priors 
    for (t in 1:T) {
        alpha[t] ~ gamma(alpha_a, alpha_b);
        beta[t] ~ gamma(beta_a, beta_b);
    }
    alpha_a ~ cauchy(0, 2.5);
    alpha_b ~ cauchy(0, 2.5);
    beta_a ~ cauchy(0, 2.5);
    beta_b ~ cauchy(0, 2.5);
    tau ~ cauchy(0, 2.5);
    L ~ lkj_corr_cholesky(2);
    
    // time series
    for (t in 2:T) {
        for (j in 1:J) {
            [alpha[t,j], beta[t,j]] ~ multi_normal_cholesky([alpha[t-1,j], beta[t-1,j]], diag_pre_multiply(tau,L));
        }
    }

    // poisson model, the last bit excludes ties as they are not possible
    for (n in 1:N) {
        target += poisson_lpmf(GT[n,1] | alpha_teams[n,1]*beta_teams[n,2]) + 
                  poisson_lpmf(GT[n,2] | alpha_teams[n,2]*beta_teams[n,1]) -
                  log1m_exp(-(alpha_teams[n,1]*beta_teams[n,2] + alpha_teams[n,2]*beta_teams[n,1])*
                            log(modified_bessel_first_kind(0, 
                                2*sqrt(alpha_teams[n,1]*beta_teams[n,2]*alpha_teams[n,2]*beta_teams[n,1]))));
    }
}

Any help would be appreciated! I feel like there are a lot of opportunities for vectorization here, but I’m not quite savvy enough to figure it out.

amas0 · April 16, 2021, 7:35pm

Whelp, bigger issues – the model doesn’t quite seem to do what I had hoped. Seems like throwing the time series bit in there really threw the
Poisson estimates off quite a bit.

If anyone has any thoughts on how to better approach a model of this style, it’d be appreciated!

Kbus007 · April 17, 2021, 4:23pm

I have already encountered similar models for mortality forecasting, namely a count process where the inner parameters follow some time series.

In my experience, the slow sampling comes from the call to the multi_normal_cholesky. One way I found to tackle this is to use a well-known property of a bivariate normal to come back to independent normals (see Wikipedia):

Therefore, if you want to have something like:

You can write the following lines of code:

target += normal_lpdf(alpha[2:T] | alpha[1:(T- 1)], sigma[1]);
target += normal_lpdf(beta[2:T] | beta[1:(T- 1)] + rho * sigma[2]*(alpha[2:T]-alpha[1:(T- 1)])/ sigma[1],
sigma[2] * sqrt(1 - square(rho)));

Hence, you just need to call independent univariate normals, which is much faster than calling the multi_normal_cholesky function

Hope it helps!

amas0 · April 17, 2021, 5:02pm

Clever! I like this and will definitely give it a shot.

You mention having seen similar models in mortality forecasting, would you happen to have any references/papers I could take a look at? I think it would help me conceptually to take a look at how they specify their models.

Kbus007 · April 17, 2021, 5:34pm

Well, if I can promote my own research, you can have a look at the paper I wrote with my colleagues.

In this paper, you will see the priors and likelihood specifications, and some commonly used mortality models.

For the stan implementation, I worked on a R package implementing 5 mortality models in Stan:

You can see the Stan code there: StanMoMo/inst/stan at master · kabarigou/StanMoMo · GitHub
You can see the corresponding model specifications there for instance: Function reference • StanMoMo

In particular, two of them (CBD and M6 models) have multivariate random walks priors similar to your case.

amas0 · April 17, 2021, 5:49pm

I’ll dig in, thank you for the help!

Topic		Replies	Views
Coding related questions from a rookie's first project Modeling techniques , specification	0	355	June 5, 2019
Help with creating desired vector Modeling specification	3	411	January 8, 2021
Bivariate poisson distribution in brms: Trying to replicate football Stan model brms specification , hierarchical-model	4	1756	November 4, 2021
Time series modelling: intermediate values Modeling	9	1355	August 30, 2017
Multivariate Poisson Modeling	4	396	March 23, 2024

Optimizing sampling of a large Poisson model with a multivariate time series component

Related topics