# Variable selection with ordinal model

A year after my previous post, I would like to know if you have experiences on how to perform variable selection with an ordinal regression model, even with sub-optimal strategies like brute force (i.e. checking LOO-IC running a lot of models). I have many predictors so I would spend years in checking all the combinations.

How many? And how much data? The ratio of the number of predictors and the number of observations, and signal-noise ratio affect what to recommend.

@AlejandroCatalina and @paul.buerkner could comment how much work it would be to get ordinal regression to work with projpred.

I have 6252 observation on K=4 levels (C-B-A-A+) and 32 predictors.
I have 70 observations in class C, 2294 in class B, 3152 in class A and 70 in class A+

I briefly discussed the potential of supporting ordinal regression models with @paul.buerkner and there are a few challenges that wonâ€™t allow us to include them in the near future.

Namely, the most challenging issue from the theoretical point of view is that most ordinal regression models donâ€™t belong to the exponential family, and that complicates things quite a bit. The broader question of supporting non exponential family models is very high in our TODO list, but I guess we should not expect this to be solved very soon.

Thanks, are there approximate methods or some other strategies to work with the `ordered_logistic_lpmf`, or should I just give-up?

Not in `projpred` at the moment. Regarding other approaches to do some sort of variable selection in your case I canâ€™t really tell, as indeed itâ€™s big enough that trying many combinations is infeasible.

The smaller number of observations for C and A+ means some care is needed, but 32 predictors is not that much.

Whatâ€™s the computation time for the full model?

Can you implement regularized horseshoe or some other sparsity assumption prior on the coefficients? Thereâ€™s a [ Regression and Other Stories: Student][https://avehtari.github.io/ROS-Examples/Student/student.html] example showing an easy case where RHS is able to help to detect the most useful predictors without need for extensive model space search or projpred. Of course itâ€™s not always easy to know when the case is easy, and it also depends on what you want from the variable selection.

The full model takes 12 hours to run and 4% of divergent transitions.
I am doing something very similar to what is shown in Regression and Other Stories: Student] and based on LOO-IC. Anyway, it takes a long time to compare the modelsâ€¦
By the way, I just discovered from your post the existence of the book â€śregression and other storiesâ€ť. I will definitely buy it!

Hmmmâ€¦ Thatâ€™s a lot for both. Is the model code your own or brms generated code?

`brms` generated. I am working on this model -more or less successfully - for more than a year. see one of my previous posts on setting priors
I think the problem is not the slow computation: the model as it is struggles to distill the signal that is in the response variable. To improve I need to test various models and various combinations of variables. And this is time-consuming â€¦

One approach would be to simulate from the posterior predictive distribution and use that as reference predictions, but project to normal regression model. This would make the projection and variable selection fast, and after finding the path through the model space, compute ordinal regression once for each model size, so there will be max 32 models to run.

1 Like

I do not understand it â€¦ but it sounds like a plan! :-)
If itâ€™s not time-consuming for you (i do not went to be pushy) would you mind to help me in understanding the process step-by-step, so what we can sort out together would be publicly available here for a broader audience?
First of all: should I build some generated observation set and a dozen of generated variables (for more control), or start from an available real-case dataset?

Can you copy the brms generated model code here, so I can use the correct variable names

This is the Stan code with default priors (i.e. not horseshoe):

``````// generated with brms 2.13.5
functions {
/* cumulative-logit log-PDF for a single response
* Args:
*   y: response category
*   mu: latent mean parameter
*   disc: discrimination parameter
*   thres: ordinal thresholds
* Returns:
*   a scalar to be added to the log posterior
*/
real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
int nthres = num_elements(thres);
real p;
if (y == 1) {
p = inv_logit(disc * (thres[1] - mu));
} else if (y == nthres + 1) {
p = 1 - inv_logit(disc * (thres[nthres] - mu));
} else {
p = inv_logit(disc * (thres[y] - mu)) -
inv_logit(disc * (thres[y - 1] - mu));
}
return log(p);
}
/* cumulative-logit log-PDF for a single response and merged thresholds
* Args:
*   y: response category
*   mu: latent mean parameter
*   disc: discrimination parameter
*   thres: vector of merged ordinal thresholds
*   j: start and end index for the applid threshold within 'thres'
* Returns:
*   a scalar to be added to the log posterior
*/
real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
}
/* ordered-logistic log-PDF for a single response and merged thresholds
* Args:
*   y: response category
*   mu: latent mean parameter
*   thres: vector of merged ordinal thresholds
*   j: start and end index for the applid threshold within 'thres'
* Returns:
*   a scalar to be added to the log posterior
*/
real ordered_logistic_merged_lpmf(int y, real mu, vector thres, int[] j) {
return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
}
}
data {
int<lower=1> N;  // number of observations
int Y[N];  // response variable
int<lower=2> nthres;  // number of thresholds
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K;
matrix[N, Kc] Xc;  // centered version of X
vector[Kc] means_X;  // column means of X before centering
for (i in 1:K) {
means_X[i] = mean(X[, i]);
Xc[, i] = X[, i] - means_X[i];
}
}
parameters {
vector[Kc] b;  // population-level effects
ordered[nthres] Intercept;  // temporary thresholds for centered predictors
}
transformed parameters {
real<lower=0> disc = 1;  // discrimination parameters
}
model {
// initialize linear predictor term
vector[N] mu = Xc * b;
// priors including all constants
target += student_t_lpdf(Intercept | 3, 0, 2.5);
// likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
}
}
}
generated quantities {
// compute actual thresholds
vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}
``````

A quick note â€“ the text directly after the first figure on https://avehtari.github.io/ROS-Examples/Student/student.html seems to be wrong.

1 Like

Thanks, fixed (github.io seems to slow to show the updated html, but itâ€™s in the repo)

1 Like

For the cumulative ordinal with logit the latent model is close to normal linear with sd=1.6. Hereâ€™s a quick draft what to do, but I didnâ€™t have time to check all details. If you have problems, @paul.buerkner said he is happy to help to get this working.

See â€śCustom reference modelsâ€ť section in projpred Quick Start. Create the reference model as (something like)

``````predfun <- function(xt) t( b %*% t(xt) )
``````

where b is the b in the model. And then init the reference model (something like)

``````sigma <- rep(1.6,S)
Emu <- colMeans(mu)
ref <- init_refmodel(Xc, Emu, gaussian(), x=x, predfun=predfun, dis=sigma)
``````

where S is the number of draws, and mu is the posterior draws of mu. Emu is not used for the projection, but is used for RMSE computation.

3 Likes

Thanks a lot to you and to @paul.buerkner too.
I will try it in few hour (I have field work until sunsetâ€¦) so later I will post here results and discussion.
stay tunedâ€¦ :-)