Horseshoe prior for varying slopes and intercepts model

I have data where I am looking at predicting drug resistance (binary) by the presence/absence of different genes. Few genes are thought to have any real effect. In addition, the presence of some genes is very low, and the presence/absence can be highly correlated among genes. This can make for separation problems in the logistic regression. My goal is prediction. I want to predict the outcome (drug resistance) for a variety of drugs using the genes.

Let’s say I have a bunch of genes and a single drug, then I could simply fit a logistic regression model and specify a regularized horseshoe prior. However, I have many drugs, which are somewhat related, so I would actually like to specify a multilevel model, such that information is partially pooled across drugs for the effect of the gene on the outcome resistance.

I have simulated some data that looks pretty close to my real data, by specifying an uncorrelated varying intercepts and varying slopes model to simulate the data, where the varying effects follow a student t distribution (when I run a multilevel model on my real data, all of the correlations between varying effects are around 0.0 with credible intervals that broadly (and rather evenly) cover both sides of zero).
See this txt file for the code for the data simulation and model.

code simulation uncorrelated varying intercepts slopes resistance data.txt (3.3 KB)

The model that I have run looks like this:

#set up model, varying intercepts and slopes model where varying effects modeled via student distribution
bf_logistic <- bf(y ~ 1 + gene1 + gene2 + gene3 + gene4 + gene5 + gene6 + gene7 + gene8 + gene9 + gene10 +
		 gene11 + gene12 + gene13 + gene14 + gene15 + (1 + gene1 + gene2 + gene3 + gene4 + gene5 + gene6 + gene7 + gene8 + gene9 + gene10 +
		 gene11 + gene12 + gene13 + gene14 + gene15 | gr(drug_f, dist="student"))) + bernoulli()

#set up regularized hs prior for logistic model
p <- 5 #guess at number of relevant variables
D <- 16 #number of variables in the model (is this correct even for hierarchical model??? I assume it is number of population-level effects)
sigma <- 2   #as recommended for logistic regression in slide 14 https://github.com/avehtari/modelselection/blob/master/regularizedhorseshoe_slides.pdf
n <- nrow(d1)   #number of observations in dataset
scale_global <- (p/(D-p))*(sigma/sqrt(n))

prior1 <- c(
	prior(horseshoe(df = 1, scale_global = scale_global), class=b),
	prior(cauchy(0, 1), class=sd),
	prior(lkj(2), class=cor)
	)

m1 <- brm(bf_logistic,
	prior = prior1, data=d1, chains=4, iter=2000, warmup=1000,
	control = list(adapt_delta = 0.999, max_treedepth=13), cores=4)

The model takes a little bit of time to run, so here are the results from the above sim and model run
sim model output.txt (13.2 KB)

Notice I have specified the regularized horseshoe prior for the ‘population-level’ effects, and a cauchy prior for the varying effects. What I would like to do is to model the sparsity in the population-level effects and allow the genes to effect the outcome differently for different drugs (but share information; partial pooling), but at the same time, I also want a similar sparse prior on the varying effects.

I am trying to wrap my mind around what it means to impose the hs prior on the population-level effects but then have all those varying slopes in the model… do the varying slopes sort of ‘water down’ the effect of the hs prior? In other words, I use this hs prior with the goal of pulling noise toward zero, but then I have a varying slope, so is it possible that the varying slope simply allows the noise to exist in the model even though the hs prior has done it’s job on the population-level effects?
Is my cauchy prior a good choice?

Based on this plot, it appears the hs prior has done it’s job for the population-level effects.

Also, using coef(m1) in brms, I can view the slopes for each gene for each drug, and it appears that the combination of the hs prior and cauchy prior works as intended.
coef m1.txt (10.3 KB)

I would appreciate thoughts on this! It would be great if someone could confirm that using a hs prior in a multilevel model such as described above isn’t a poor idea. Thanks!

@avehtari

They can. It would be good if you can assume that the variation around the population level slope is smallish, or otherwise you will can explain either with population level slope or with the group level slopes and thsu have multimodality.

Probably too thick tailed, unless you have informative data.

Then it seems you have informative data.

There are many papers presenting also methods with inferring similarity between genes and drugs. You have total of N_genes x N_drugs slopes. If you think that as a matrix, then you have 0-median horseshoe prior on row mean, and row-median Cauchy prior on columns. You could also assume there is more structure, so that there would the matrix would have a sparse block structure. Unfortunately I couldn’t remember the names of such papers, and my quick googling didn’t find those, but I let you know if I remember.

It can work, and it seems it works for you because you have informative data. With less informative data, this could be too weak and more structure would be needed.

It also looks like there are collinear predictors, as some of the marginals are quite wide. Those predictors can be highly predictive, even if the marginals would be overlapping zero, see e.g. casestudy Bayesian Logistic Regression with rstanarm

Thanks for the helpful responses!!

This was my concern… yes, it would seem that for this to work the assumption is that the sd for group level slopes is low. I think in some of my real data I can see multi-modality in the plots of the posteriors for some coefficients of population-level slopes, and I think this may be for the genes where the sd for the varying slopes for that drug is large. I may group drugs differently and run separate models. The main reason to use the partial pooling approach is that for some genes, the presence is very low (like maybe only seen present twice out of 85 observations), and the use of partial pooling across many drugs effectively gives us more data for those genes with very low presence than we would have running a separate model on each drug.

Yes, I realized that. I switched to student-t.

I am not sure what a matrix with a sparse block structure would look like from the model specification perspective… I will try to see if I can find those papers. Thanks.

That plot was just from the little data sim that I wrote for this forum question. I’m actually only interested purely in prediction, so I thought I would leave all genes in the model rather than using something like projection predictive variable selection to find the best sub-model.