Understanding performance expectations of `_glm` distributions

Howdy,

I’m working on modeling star ratings (1-5 scale) with an item response theory style model, with effects for individual reviewers and titles being reviewed. My full dataset is rather large ~160k reviews across ~10k titles, and ~45k reviewers, so I’ve been particularly focused on trying to understand optimizing this model. I’ve primarily been sampling against a vectorized ordered_logistic function, which has worked pretty good, but I’m trying to understand if there are any other modifications I can make to improve performance.

In particular, I had the thought that trying to use the _glm variant of the ordered logistic likelihood would help out, but in my initial naive attempts it seemed to be worse performance – I am assuming because my data doesn’t naturally fit into the format; my currently likelihood is:

y ~ ordered_logistic(eta[g] - gamma[r], c);

where g and r are index arrays (see the full model code below). I can do the trivial glm translation and write:

y ~ ordered_logistic_glm(rep_row_vector(1, N), eta[g] - gamma[r], c);

and get something that works, but worse. Can I just assume that because of the kind of parameters I have in this model that the glm model gives me no benefit?

Other than that, does anyone have any suggestions on how this model could be sped up at all?

Appreciate any insight!

functions {
  real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
    int K = num_elements(c) + 1;
    vector[K - 1] sigma = inv_logit(phi - c);
    vector[K] p;
    matrix[K, K] J = rep_matrix(0, K, K);
    
    // Induced ordinal probabilities
    p[1] = 1 - sigma[1];
    for (k in 2:(K - 1))
      p[k] = sigma[k - 1] - sigma[k];
    p[K] = sigma[K - 1];
    
    // Baseline column of Jacobian
    for (k in 1:K) J[k, 1] = 1;
    
    // Diagonal entries of Jacobian
    for (k in 2:K) {
      real rho = sigma[k - 1] * (1 - sigma[k - 1]);
      J[k, k] = - rho;
      J[k - 1, k] = rho;
    }
    
    return   dirichlet_lpdf(p | alpha)
           + log_determinant(J);
  }

}
data {
    int<lower=0> N; // number of observations
    int<lower=1> G; // number of titles
    int<lower=1> R; // number of reviewers
    array[N] int<lower=1, upper=5> y; // ratings
    array[N] int<lower=1, upper=G> g; // title index
    array[N] int<lower=1, upper=R> r; // reviewer index
}
parameters {
    real mu_eta;
    real<lower=0> tau;
    vector[G] eta_raw;
    vector[R] gamma;
    ordered[4] c;
}
transformed parameters {
    vector[G] eta;
    eta = mu_eta + tau*eta_raw;
}
model {
    mu_eta ~ normal(0, 1);
    tau ~ normal(0, 2);
    gamma ~ normal(0, 2);
    eta_raw ~ normal(0, 1);
    c ~ induced_dirichlet(rep_vector(1, 5), 0);

    y ~ ordered_logistic(eta[g] - gamma[r], c);
}

_glm functions provide a speedup if matrix x and vector beta are big. Your x is row vector (ie, just one row matrix), and as is all 1’s it is not doing anything, and this case using _glm just adds that additional unnecessary multiplication by 1’s.

If your ordered_logistic() first term would be in a form of x * beta with big x and big beta (where elements of beta are not all 1), then you could get speedup.

The biggest speedup in ´-glm()comes from reducing the size of autodiff expression tree. For example, ifxis data andbetais parameter vector, then only the length ofbetavector affects the size of the autodiff expression tree, and affects how much_glm` provides speedup.

Awesome, thanks for the clarification Aki! I assumed it was something in this vein, but the confirmation is helpful

1 Like