Variable selection with ordinal model

fabio · August 5, 2020, 10:36am

A year after my previous post, I would like to know if you have experiences on how to perform variable selection with an ordinal regression model, even with sub-optimal strategies like brute force (i.e. checking LOO-IC running a lot of models). I have many predictors so I would spend years in checking all the combinations.
Thanks in advance

maxbiostat · August 5, 2020, 12:35pm

@avehtari

avehtari · August 5, 2020, 1:15pm

How many? And how much data? The ratio of the number of predictors and the number of observations, and signal-noise ratio affect what to recommend.

@AlejandroCatalina and @paul.buerkner could comment how much work it would be to get ordinal regression to work with projpred.

fabio · August 5, 2020, 1:45pm

I have 6252 observation on K=4 levels (C-B-A-A+) and 32 predictors.
I have 70 observations in class C, 2294 in class B, 3152 in class A and 70 in class A+

AlejandroCatalina · August 5, 2020, 1:50pm

I briefly discussed the potential of supporting ordinal regression models with @paul.buerkner and there are a few challenges that won’t allow us to include them in the near future.

Namely, the most challenging issue from the theoretical point of view is that most ordinal regression models don’t belong to the exponential family, and that complicates things quite a bit. The broader question of supporting non exponential family models is very high in our TODO list, but I guess we should not expect this to be solved very soon.

fabio · August 5, 2020, 1:56pm

Thanks, are there approximate methods or some other strategies to work with the ordered_logistic_lpmf, or should I just give-up?

AlejandroCatalina · August 5, 2020, 2:08pm

Not in projpred at the moment. Regarding other approaches to do some sort of variable selection in your case I can’t really tell, as indeed it’s big enough that trying many combinations is infeasible.

fabio · August 5, 2020, 2:16pm

Thanks… I sadly marked your answer as a “solution”

avehtari · August 5, 2020, 3:36pm

The smaller number of observations for C and A+ means some care is needed, but 32 predictors is not that much.

What’s the computation time for the full model?

Can you implement regularized horseshoe or some other sparsity assumption prior on the coefficients? There’s a [ Regression and Other Stories: Student][Regression and Other Stories: Student] example showing an easy case where RHS is able to help to detect the most useful predictors without need for extensive model space search or projpred. Of course it’s not always easy to know when the case is easy, and it also depends on what you want from the variable selection.

fabio · August 5, 2020, 3:38pm

The full model takes 12 hours to run and 4% of divergent transitions.
I am doing something very similar to what is shown in Regression and Other Stories: Student] and based on LOO-IC. Anyway, it takes a long time to compare the models…
By the way, I just discovered from your post the existence of the book “regression and other stories”. I will definitely buy it!

avehtari · August 5, 2020, 4:16pm

Hmmm… That’s a lot for both. Is the model code your own or brms generated code?

fabio · August 5, 2020, 4:33pm

brms generated. I am working on this model -more or less successfully - for more than a year. see one of my previous posts on setting priors
I think the problem is not the slow computation: the model as it is struggles to distill the signal that is in the response variable. To improve I need to test various models and various combinations of variables. And this is time-consuming …

avehtari · August 5, 2020, 5:09pm

One approach would be to simulate from the posterior predictive distribution and use that as reference predictions, but project to normal regression model. This would make the projection and variable selection fast, and after finding the path through the model space, compute ordinal regression once for each model size, so there will be max 32 models to run.

fabio · August 5, 2020, 5:36pm

I do not understand it … but it sounds like a plan! :-)
If it’s not time-consuming for you (i do not went to be pushy) would you mind to help me in understanding the process step-by-step, so what we can sort out together would be publicly available here for a broader audience?
First of all: should I build some generated observation set and a dozen of generated variables (for more control), or start from an available real-case dataset?

avehtari · August 5, 2020, 6:06pm

Can you copy the brms generated model code here, so I can use the correct variable names

fabio · August 5, 2020, 10:29pm

This is the Stan code with default priors (i.e. not horseshoe):

// generated with brms 2.13.5
functions {
  /* cumulative-logit log-PDF for a single response
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: ordinal thresholds
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_lpmf(int y, real mu, real disc, vector thres) {
     int nthres = num_elements(thres);
     real p;
     if (y == 1) {
       p = inv_logit(disc * (thres[1] - mu));
     } else if (y == nthres + 1) {
       p = 1 - inv_logit(disc * (thres[nthres] - mu));
     } else {
       p = inv_logit(disc * (thres[y] - mu)) -
           inv_logit(disc * (thres[y - 1] - mu));
     }
     return log(p);
   }
  /* cumulative-logit log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   disc: discrimination parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real cumulative_logit_merged_lpmf(int y, real mu, real disc, vector thres, int[] j) {
     return cumulative_logit_lpmf(y | mu, disc, thres[j[1]:j[2]]);
   }
  /* ordered-logistic log-PDF for a single response and merged thresholds
   * Args:
   *   y: response category
   *   mu: latent mean parameter
   *   thres: vector of merged ordinal thresholds
   *   j: start and end index for the applid threshold within 'thres'
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real ordered_logistic_merged_lpmf(int y, real mu, vector thres, int[] j) {
     return ordered_logistic_lpmf(y | mu, thres[j[1]:j[2]]);
   }
}
data {
  int<lower=1> N;  // number of observations
  int Y[N];  // response variable
  int<lower=2> nthres;  // number of thresholds
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K;
  matrix[N, Kc] Xc;  // centered version of X
  vector[Kc] means_X;  // column means of X before centering
  for (i in 1:K) {
    means_X[i] = mean(X[, i]);
    Xc[, i] = X[, i] - means_X[i];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  ordered[nthres] Intercept;  // temporary thresholds for centered predictors
}
transformed parameters {
  real<lower=0> disc = 1;  // discrimination parameters
}
model {
  // initialize linear predictor term
  vector[N] mu = Xc * b;
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
  // likelihood including all constants
  if (!prior_only) {
    for (n in 1:N) {
      target += ordered_logistic_lpmf(Y[n] | mu[n], Intercept);
    }
  }
}
generated quantities {
  // compute actual thresholds
  vector[nthres] b_Intercept = Intercept + dot_product(means_X, b);
}

andymilne · August 6, 2020, 12:43am

A quick note – the text directly after the first figure on https://avehtari.github.io/ROS-Examples/Student/student.html seems to be wrong.

avehtari · August 6, 2020, 7:44am

Thanks, fixed (github.io seems to slow to show the updated html, but it’s in the repo)

avehtari · August 6, 2020, 10:10am

For the cumulative ordinal with logit the latent model is close to normal linear with sd=1.6. Here’s a quick draft what to do, but I didn’t have time to check all details. If you have problems, @paul.buerkner said he is happy to help to get this working.

See “Custom reference models” section in projpred Quick Start. Create the reference model as (something like)

predfun <- function(xt) t( b %*% t(xt) )

where b is the b in the model. And then init the reference model (something like)

sigma <- rep(1.6,S)
Emu <- colMeans(mu)
ref <- init_refmodel(Xc, Emu, gaussian(), x=x, predfun=predfun, dis=sigma)

where S is the number of draws, and mu is the posterior draws of mu. Emu is not used for the projection, but is used for RMSE computation.

fabio · August 6, 2020, 10:36am

Thanks a lot to you and to @paul.buerkner too.
I will try it in few hour (I have field work until sunset…) so later I will post here results and discussion.
stay tuned… :-)

Topic		Replies	Views
Variable selection in Brms brms	10	1787	March 2, 2020
Can I use projpred for this exploratory approach? General projpred	2	419	June 2, 2021
Projective variable selection for a Ordered Logistic model Algorithms	5	806	July 2, 2019
Projpred: latent projection for hierarchical ordinal models General projpred	6	61	July 2, 2025
Brms - Variable/Feature Selection with ordinal response Modeling model-selection	2	829	June 7, 2021

Variable selection with ordinal model

Related topics