Variable selection with ordinal model

AlejandroCatalina · August 13, 2020, 7:48am

Yes of course. I used the Stan models that @fabio provided earlier in this thread.

library(rstan)
library(projpred) # 2.0.0 devtools::install_github("stan-dev/projpred@develop")
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

set.seed(123)

N <- 500 # number of observations
M <-  30 # number of predictors
K <-   5 # number of ordinal categories

c <- sort(runif(n = K - 1, min = -5, max = 5))

generative <- stan_model("./generate_data.stan")

gen <- sampling(generative,
    data = list(N = N, M = M, K = K, c = c),
    seed = 123, algorithm = "Fixed_param", iter = 1, chains = 1
)
beta <- as.numeric(extract(gen, pars = "beta")$beta)
X <- matrix(
    data = as.numeric(extract(gen, pars = "X")$X),
    nrow = N, ncol = M
)
y <- as.integer(extract(gen, pars = "y")$y)

model <- stan_model("./ordinal_regression.stan")
fit <- sampling(model,
    data = list(N = N, M = M, K = K, X = X, y = y),
    seed = 123, chains = 4
)

mu <- extract(fit, pars = "mu")
Emu <- as.matrix(colMeans(mu$mu))

beta_out <- extract(fit, pars = "beta")$beta
beta <- as.matrix(colMeans(beta_out))

predfun <- function(fit, newdata = NULL) {
    if (is.null(newdata)) {
    newdata <- data
    }
    return(as.matrix(newdata[, -1]) %*% t(beta_out))
}

## compute empirical sd; too large
sigma <- apply(mu$mu, 1, sd)

colnames(X) <- paste0("x", seq_len(NCOL(X)))
data <- as.data.frame(cbind(y = Emu, x = X))
formula <- as.formula(paste("y ~", paste(colnames(X), collapse = " + ")))

.extract_model_data <- function(object, newdata = NULL, wrhs = NULL,
                                orhs = NULL, resp_form = NULL) {
  if (is.null(newdata)) {
    newdata <- data
  }

  if (inherits(wrhs, "formula")) {
    weights <- eval_rhs(wrhs, newdata)
  } else if (is.null(wrhs)) {
    weights <- rep(1, NROW(newdata))
  }

  if (inherits(orhs, "formula")) {
    offset <- eval_rhs(orhs, newdata)
  } else if (is.null(orhs)) {
    offset <- rep(0, NROW(newdata))
  }

  if (inherits(resp_form, "formula")) {
    y <- eval_rhs(resp_form, newdata)
  } else {
    y <- NULL
  }

  return(nlist(y, weights, offset))
}

extract_model_data <- function(object, newdata = NULL, wrhs = NULL,
                               orhs = NULL) {
  resp_form <- lhs(formula)
  args <- nlist(object, newdata, wrhs, orhs, resp_form)
  return(do_call(.extract_model_data, args))
}

ref <- init_refmodel(fit, data, formula, gaussian(),
    ref_predfun = predfun, div_minimizer = linear_mle,
    proj_predfun = linear_proj_predfun,
    extract_model_data = extract_model_data, dis = sigma
)

vs <- varsel(ref, method = "forward")
cvs <- cv_varsel(ref, method = "forward")

## fix sigma
sigma <- rep(1.6, 4000)

ref <- init_refmodel(fit, data, formula, gaussian(),
    ref_predfun = predfun, div_minimizer = linear_mle,
    proj_predfun = linear_proj_predfun,
    extract_model_data = extract_model_data, dis = sigma
)

vs <- varsel(ref, method = "forward")
cvs <- cv_varsel(ref, method = "forward")

fweber144 · September 17, 2020, 2:32pm

Sorry for digging this out, but shouldn’t this line

predfun = function(xt) t( beta %*% t(xt) )

read

predfun = function(xt) t( beta_out %*% t(xt) )

for your simulated example?

In any case, thanks for this very interesting discussion!

AlejandroCatalina · September 22, 2020, 12:53pm

Yes, I believe the code I shared just above includes the right predfun!

fabio · August 15, 2021, 8:21am

@AlejandroCatalina @Francesco_De_Vincentis
the extract_model_data() function needs the projpred's internal function lhs() thet seems to be not exposed by the package.
I tried to copy it from the source code but I need to copy-paste a lot of dependent functions too.
I use the version 2.0.2 of projpred.
Could you please help us to expose the relevant functions?
Thanks in advance!

fweber144 · August 21, 2021, 1:36pm

Have you tried projpred:::lhs()?

avehtari · September 13, 2021, 2:07pm

We have now a preprint out on using projpred for, e.g., ordinal and survival models

The code is in the latent_projection branch of the projpred github repo, if you want to test. We’re working on a vignette and additional testing, before merging to main branch.

fabio · September 13, 2021, 4:55pm

Soooo thanks @avehtari @paul.buerkner @AlejandroCatalina !

JohnTownsend92 · October 8, 2021, 11:53am

Thanks for making this available @avehtari ! Unfortunately I can’t get it working with a cumulative link model.

I installed projpred with the latent_projection branch using:
devtools::install_github(‘stan-dev/projpred’, build_vignettes = TRUE, ref=‘latent_projection’)

I then perform an ordinal regression using brms (version 2.16.2) as follows:
fit ← brm(var ~ A+B+C+D+E+F, x, family=cumulative())

I then try to perform variable selection using projpred (version 2.0.5.9000) as follows:
ref ← get_refmodel(fit)

But I get the following error message:
Error in extend_family(family) :
Family ‘cumulative’ is not supported by projpred.

Do you have any idea what is going on here?

Many thanks,
John

AlejandroCatalina · October 8, 2021, 12:32pm

Yes, we are still working on the branch and somethings are still under heavy development. We will hopefully have it fully ready soon!

Best,
Alex

JohnTownsend92 · October 8, 2021, 1:59pm

Cool @AlejandroCatalina ! Will keep an eye out for this, looking forward to this being available!

JohnTownsend92 · October 13, 2021, 9:51am

@AlejandroCatalina In the meantime, I still need to do some variable selection with ordinal models. Currently I am using a simple forward feature selection algorithm to pick variables to add to the model by trying to minimise LOOIC. Could you comment on whether this is a sound approach? In particular with respect to overfitting? Many thanks, John

AlejandroCatalina · October 13, 2021, 9:54am

It sounds like it can potentially overfit. You can still use projpred with a small caveat, but you need to manually build the refmodel object and supply Gaussian as family with latent=TRUE. The caveat si that the reported ELPD will be on the latent Gaussian space, instead of being the response scale. I can share couple lines of code to do this if you’re interested.

Best,
Alex

JohnTownsend92 · October 13, 2021, 10:24am

@AlejandroCatalina Are there any issues with reporting ELPD will be on the latent scale? Am I correct that ELPD on the latent scale will still allow me to pick the “best” model?

If you could share the code on how to build the refmodel object and run projpred then that would be fantastic. I am trying to run ordinal regressions using cumulative link models and stopping ratio models.

Much appreciated,
John

AlejandroCatalina · October 13, 2021, 10:43am

Yes indeed the selection is the same, it’s just that you might be looking at the ELPD values thinking that they are on the response scale.

The idea behind the latent approach is to fit the projections on the latent predictive space, meaning we in practice replace the original response by the latent prediction as computed by t(posterior_linpred(fit, transform=FALSE)). So, with this said, you can follow this snippet (and fix whatever necessary):

## asumming you have defined an object `fit`

y_latent <- t(posterior_linpred(fit, transform=FALSE))

data[[".y"]] <- rowMeans(y_latent)

dis_latent <- rep(1, n_draws_fit) # n_draws_fit needs to match the number of posterior draws in fit

ref_formula <- update(formula(fit), ".y ~ .")

ref <- init_refmodel(fit,
data = data, formula = ref_formula, family = gaussian(),
ref_predfun = ref_predfun, div_minimizer = projpred:::linear_mle,
proj_predfun = projpred:::linear_proj_predfun, dis = dis_latent,
extract_model_data = extract_model_data, cvfun = cvfun
)

The only missing pieces are ref_predfun, cvfun and extract_model_data, that are defined as:

cvfun <- function(folds, ...) {
  cvres <- brms::kfold(
    object,
    K = max(folds), save_fits = TRUE, folds = folds, ...
  )
  fits <- cvres$fits[, "fit"]
  return(fits)
}

extract_model_data <- function(object, newdata = NULL, wrhs = NULL,
                               orhs = NULL, extract_y = TRUE) {
  if (!extract_y) {
    resp_form <- NULL
  } else {
    resp_form <- ~.y
  }

  if (is.null(newdata)) {
    newdata <- data
  }

  if (is.null(wrhs) && !is.null(object) &&
    !is.null(object$weights) && length(object$weights) != 0) {
    wrhs <- ~weights
    newdata <- cbind(newdata, weights = object$weights)
  }

  if (is.null(orhs) && !is.null(object) &&
    !is.null(object$offset) && length(object$offset) != 0) {
    orhs <- ~offset
    newdata <- cbind(newdata, offset = object$offset)
  }

  args <- nlist(object, newdata, wrhs, orhs, resp_form)
  return(do_call(projpred:::.extract_model_data, args))
}

ref_predfun <- function(fit, newdata = NULL) {
  return(t(posterior_linpred(fit,
    newdata = newdata,
    transform = FALSE
  )))
}

ACB · October 13, 2021, 1:04pm

I have also been looking at variable selection in ordinal models. One “sub-optimal strategy” (OK, I admit hack) is to first build the full ordinal response model in brms, then sample from the linear predictor to convert the ordinal response to a continuous response. Then build a full model on this continuous response and use varsel to select the terms.

Plots to compare the ordinal response values to the linear predictor samples and plots comparing ordinal and linear predictor model coefficients show if the linear predictor model is a reasonable approximation.

I have found that this works in a simulation with 3 or 4 factors, varsel either selects the true model structure in the fake data or misses by one term. Building a few ordinal response models around suggest_size and comparing information criteria has always found the fake data structure.

But these are tiny datasets, I do not know how this would work with your huge dataset.

I would be very pleased to see any opinions on this hack - is it reasonable or too dangerous to use!

AlejandroCatalina · October 13, 2021, 1:06pm

That is indeed very related to our latent approach. There is some uncertainty regarding what “build a full model on this continuous response” means, but it is anyway very related to our approach and therefore principled as far as our latest developments goes.

Best,
Alex

JohnTownsend92 · October 14, 2021, 9:44am

@AlejandroCatalina Thanks for sharing this code. I managed to build the refmodel with a few edits to the code. But some weird stuff happens when I perform the variable selection.

Firstly, when I run the following:
vs <- varsel(ref, method = "forward", nterms_max = projpred:::count_terms_in_formula(ref$formula))
plot(vs, stats = c('elpd', 'rmse'))

I get the the following plot:
elpd_rmse

Does this look right? It strikes me as odd that the model with 47 terms (all of the terms in my full model) does not reach the same elpd or rmse values as the full model indicated by the dashed lines.

Secondly, the cross-validated variable selection fails. When I run the following:
vs <- cv_varsel(ref, method = "forward", nterms_max = projpred:::count_terms_in_formula(ref$formula))

I get the following error message:

[1] “Repeating forward search for 165 LOO folds…”
| | 0%
warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

warning: solve(): system is singular; attempting approx solution
The search direction is not a descent direction (newton decrement = -0, should be positive), , this is likely a bug. Please report to the developers.

error: matrix multiplication: incompatible matrix dimensions: 0x1 and 2x1
Error in glm_ridge_c(x, pseudo_obs, lambda, FALSE, 0, beta_start, w0, :
matrix multiplication: incompatible matrix dimensions: 0x1 and 2x1
In addition: Warning message:
Some Pareto k diagnostic values are too high. See help(‘pareto-k-diagnostic’) for details.

Any ideas about what is going on here?

Thanks,
John

AlejandroCatalina · October 14, 2021, 11:52am

Many things could be happening, but it seems the reference model may have some issues. You can try forward search my passing method=“forward” to varsel. That should be a bit better but if there are many issues it may not resolve it. As to the first varsel plot it seems to indicate that the evaluation of the reference model may be overoptimistic.

Best,
Alex

JohnTownsend92 · October 19, 2021, 8:43am

Thanks for the comment @AlejandroCatalina

I can get rid of the second error when I do some preliminary feature selection (remove variables which are clearly not important).

As for the varsel plot, I’m still confused why the reference model and the model with all of the features (in my case 47 / 47 features) have different elpd and rmse values. I would have expected them to be the same as they are the same model, right? Could you please explain if there is something which I am not understanding.

Many thanks,
John

AlejandroCatalina · October 19, 2021, 8:57am

You should be able to recover the identical behavior by increasing the amount of draws used for prediction with the ndraws_pred argument.

Best,
Alex

Topic		Replies	Views
Projpred: latent projection for hierarchical ordinal models General projpred	8	204	July 9, 2025
Accounting for measurement error during variable selection with projpred (possibly with rstanarm or brms?) Modeling	25	1533	July 31, 2025
Inquiries about variable selection using projpred Modeling techniques , loo	12	1310	October 30, 2019
Logistic model selection in rstan Modeling rstan	8	1040	April 6, 2023
Variable selection in Brms brms	10	1936	March 2, 2020

Variable selection with ordinal model

Related topics