Non-centered parameterization of ordered logit model

Hi all,

I have the following centered ordered logit model

data {
    int J;                                                  // number of categories
    int<lower=0> N;
    int<lower=1> K;                                         // number of predictors
    matrix[N, K] X;                                         // covariates
    array[N] int<lower=1, upper=J> y;
}

transformed data {
    int i = 3;                                              // number of spacings between cutoffs {s1, s2, s3}
}

parameters {
    corr_matrix[K] Omega;
    vector<lower=0>[i] s;                                   // spacings between cutoff thresholds
    vector<lower=0>[K] sigma;
    vector[K] beta;
    vector[K] mu;
}

transformed parameters {
    ordered[J - 1] c;                                       // cut off thresholds
    c[4] = s[1];
    c[5] = s[1] + s[2];
    c[6] = s[1] + s[2] + s[3];
    c[1] = -c[6];
    c[2] = -c[5];
    c[3] = -c[4];
    vector[N] lambda = X * beta;
}


model {
    // priors
    Omega ~ lkj_corr(2.0);
    mu ~ normal(0, 5);
    sigma ~ normal(0, 2.5);
    s ~ std_normal();

    // likelihood
    beta ~ multi_normal(mu, quad_form_diag(Omega, sigma));
    y ~ ordered_logistic(lambda, c);
}

A lot of divergences occur when running the model. Therefore, I would like to check whether the model benefits from non-centered parameterization.
I introduce delta and rewrite the code as follows:

data {
    int J;                                                  // number of categories
    int<lower=0> N;
    int<lower=1> K;                                         // number of predictors
    matrix[N, K] X;                                         // covariates
    array[N] int<lower=1, upper=J> y;
}

transformed data {
    int i = 3;                                              // number of spacings between cutoffs {s1, s2, s3}
}

parameters {
    corr_matrix[K] Omega;
    vector<lower=0>[i] s;                                   // spacings between cutoff thresholds
    vector<lower=0>[K] sigma;
    vector[K] mu;
    vector<lower=0>[K] delta;
}

transformed parameters {
    ordered[J - 1] c;                                       // cut off thresholds
    c[4] = s[1];
    c[5] = s[1] + s[2];
    c[6] = s[1] + s[2] + s[3];
    c[1] = -c[6];
    c[2] = -c[5];
    c[3] = -c[4];
    vector<lower=0>[K] beta = mu + delta * quad_form_diag(Omega, sigma);
    vector[N] lambda = X * beta;
}


model {
    // priors
    Omega ~ lkj_corr(2.0);
    mu ~ normal(0, 5);
    sigma ~ normal(0, 2.5);
    s ~ std_normal();
    delta ~ std_normal();

    // likelihood
    y ~ ordered_logistic(lambda, c);
}

Running this model results in an error on this line

vector<lower=0>[K] beta = mu + delta * quad_form_diag(Omega, sigma);

as the datatypes are not compatible.

Please show me how its written correctly.
Thanks in advance!

Hi, the error is associated with incompatable dimensionsions. delta is a K\times 1 vector and quad_form_diag(Omega, sigma) is a K \times K matrix. Also the non-centered parameterization is incorrect in the current form. You need to pre-multiply delta by the cholesky factor of the covariance matrix:

 vector<lower=0>[K] beta = mu + cholesky_decompose(quad_form_diag(Omega, sigma)) * delta;

Thank you! The divergences have disappeared, but new problems have surfaced

025-01-16 23:11:47.520: WARNING - chain #1: 995 of 1000 transitions (99.5%) saturated the maximum treedepth
2025-01-16 23:11:47.520: WARNING - chain #2: 893 of 1000 transitions (89.3%) saturated the maximum treedepth
2025-01-16 23:11:47.520: WARNING - chain #3: 986 of 1000 transitions (98.6%) saturated the maximum treedepth
2025-01-16 23:11:47.521: WARNING - chain #4: 931 of 1000 transitions (93.1%) saturated the maximum treedepth

Any suggestions?

The main problem in this model is that you only have one K-vector, namely beta, which is not enough data to fit a multivariate normal. You have no data at all informing the covariance with only a single observation. So this isn’t going to work.

If the entries in beta are exchangeable conceptually, I’d suggest a simpler hierarchical prior on beta, namely:

   beta ~ normal(mu, sigma);

There you have K replicates with which to estimate the scale sigma. And then it’s easy to put it in non-centered form—just declare beta as

   vector<offset=mu, multiplier=sigma>[K] beta;

This is the second time I’ve seen people making assumptions about the ordering being symmetric around its center (here, setting c[1] = -c[6], etc. Why are you making that assumption?

Also, if you are making that assumption you can do this all a bit more easily this way:

parameters {
  pos_ordered[3] c_pos;
...
transformed parameters {
  ordered[6] c = append_row(-reverse(c_pos), c_pos);

That then requires a slightly different prior on s. It’s typical for people to use zero-avoiding priors here rather than the half normals like you use, which puts a lot of probability near cutpoints collapsing (e.g., with s[2] = 0, which is the mode of the prior). So something like

c_pos[2:3] - c_pos[1:2] ~ lognormal(...);  // zero-avoiding prior

This is an improper prior as it only governs 2 of the 3 degrees of freedom. You can also put a prior on c_pos[1] to complete it, which might be something like bounding c_pos away from zero, which you can do this way to govern the distance between cutpoints 3 and 4,

2 * c_pos[1] ~ lognormal(...);
1 Like