Estimate the spacing between cutoff points of a hierarchical ordered-logit model

Hi,

I want to reproduce the work of this paper. The paper fits a hierarchical ordered-logit model on the scores of judges in MMA fights.

Let y_n ∈ {7-10, 8-10, 9-10, 10-10, 10-9, 10-8, 10-7} denote the score given by a judge.
There are N observations of judges’ scores, J judges and K predictors.

The authors model y_n using an ordered-logit regression with mean lambda_n and thresholds indicating the cutoffs of each category denoted by t = (t1, ... , t6).
The goal is to estimate the spacing s between the thresholds.

The fight start at 10-10. Any subsequent actions shift the predicted score probabilities away from 10-10 in either direction. Consequently, s_1 denotes the spacing between zero and the threshold of a fighter winning 10-9. Next s_2 denotes the space between the 10-9 and 10-8 thresholds. Finally, s_3 denotes the space between winning 10-8 and 10-7.
The vector t = (−s1 − s2 − s3, −s1 − s2, −s1, s1, s1 + s2, s1 + s2 + s3)
denotes the six cutoffs.

Each judge has an individual set of parameters, representing the value they attribute to each action, denoted by beta_j = (beta_{j,1}, . . ., beta_{j,K}).

Summarised, the model is defined as follows:

How do I define t using Stan syntax?
In addition, the example in the Stan guide focuses on estimating the cutoffs points, not the spacing between the cutoffs points.

Can you provide feedback on my model implementation for a single judge?

data {
    int<lower=2> N;                         // number of observations
    int<lower=1, upper=7> y_n[N];           // 1 = 7-10 and 7 = 10-7
    int K;                                  // number of predictors
    matrix[N, K] X;
    int i                                   // number of thresholds = 3
}

parameters {
    cholesky_factor_corr[K] L_Omega;
    vector[K] mu;
    vector[K] beta;
    vector<lower=0>[K] sigma;
    ordered<lower=0>[i-1] s;
}

transformed parameters {
    cholesky_factor[K] L_Sigma = diag_pre_multiply(sigma, L_Omega);
    vector[N] lambda = X * beta;
}

model {
    mu ~ normal(0, 5);
    sigma ~ normal(0, 2.5);
    s ~ normal(0, 5);
    L_Omega ~ lkj_cholesky(2);


    for (k in 1:K) {
        beta[k] ~ multi_normal_cholesky(mu[k], L_Sigma[k]);
    }

    for (n in 1:N) {
        y[n] ~ ordered_logistic(lambda[n], t);
    }
}

I think the ordered_logistic implementation in Stan is parametrized in terms of the cutpoints (not their spacing) so you will probably need to work directly with those. But, it seems like it should be pretty easy to transform the estimated cutpoints back to the spacings by just taking the difference between the consecutive values for posterior draw?

You could even define the cutpoints c in the transformed parameters block as a function of the spacings t if you want.

Yes, but if you want to create a proper prior, you need to be aware that there are n cutpoints, but only n - 1 differences. So a prior just on the differences will be improper. If you want a zero-avoiding prior on the difference, you can do this:

cutpoints[2:K] - cutpoints[1:K-1] ~ lognormal(mu_cut, sigma_cut);

This is still not a proper prior on a K-dimensional vector of cutpoints.

We talk about this case in our prior choice Wiki, but are a bit wishy-washy about what to do.

1 Like

@Bob_Carpenter Thanks for the tips.

What do you think about the following implementation?

data {
    int<lower=2> N;                         // number of observations
    int<lower=2> S;                         // number of scores
    int<lower=1, upper=S> y[N];             // {7-10, 8-10, 9-10, 10-10, 10-9, 10-8, 10-7}
    int K;                                  // number of predictors
    matrix[N, K] X;                         // covariates
}

transformed data {
    i = 3;
}

parameters {
    ...
    vector[K] beta;
    vector<lower=0>[i] s;
}

transformed parameters {
    ...
    vector[N] lambda = X * beta;
    ordered<lower=0>[S - 1] t;
    t[1] = -1 * (s[1] + s[2] + s[3]);
    t[2] = -1 * (s[1] + s[2]);
    t[3] = -1 * s[1];
    t[4] = s[1];
    t[5] = s[1] + s[2];
    t[6] = s[1] + s[2] + s[3];
}

model {
    // prior
    ...
    s ~ normal(0, 5);

    // likelihood
    ...
    y ~ ordered_logistic(lambda, t);
}

Sorry—just saw this now.

I’m not sure why you want these to be symmetric like this nor why you want them to be composites.

What I was thinking of is

pos_ordered_vector[6] t;

P.S. If you really do want to code this, define t[4] through t[6] and then set t[1] = -t[6]; t[2] = -t[5], .... It makes the symmetry clearer and cuts down on arithmetic (in general, you can just negate rather than multiplying by -1).

1 Like

The authors state in the Chapter 3 of their paper:

“To ensure our model is realistic, the probability of a fighter gettting a 10-9 must be identical to the probability their opponent receives a 9-10. To implement this in our ordered logit, we do not directly estimate the cutoffs of each threshold, but instead estimate the spacing. … Consequently, we ensure the spacings are positive, the cutoffs are ordered correctly, and there is the required symmetry”

I don’t see a reason to disagree with them. Their methodogology is satisfied with the current implementation, right?

I fullly agree with you, thanks for the feedback.

But isn’t that enforced by generating only one of the results and then flipping around the opponent. I don’t see why it’d lead to symmetric spacing of cutpoints.

That’s the usual set up—If there’s an intercept in the regression, then the location isn’t identified, only the spacings between thresholds.