Horseshoe prior on subset of predictors

I have gene expression for > 1000 genes and a few important clinical predictors, e.g. age and gender. I would like to be able to specify a regularizing horseshoe prior on all of the genes, but a weakly regularizing prior on the clinical predictors (e.g. normal(0, 1)). However, this doesn’t seem to be possible. Is this something that can be made possible in a future release or are there issues with doing this?

Thanks!

See below for a simulated example.

set.seed(123)
#Simulated gene expression
genes <- data.frame(expression=rnorm(100*1000, mean=10, sd=2),
                    geneID=rep(paste0("Gene", 1:1000), each=100),
                    subjectID=rep(paste0("Subject", 1:100), 1000)) %>% 
  spread(key=geneID, value=expression) %>% 
  select(-subjectID)

#Simulate outcome, age, gender and then bind with gene expression
sim_data <- data.frame(y=rnorm(100), age=rnorm(100, 50, 5), gender=rbernoulli(100)*1) %>% 
  bind_cols(., genes)

#Model with different priors on genes, age, gender
fit_simulated <- brm(y ~ .,
                     prior=c(set_prior("normal(0, 1)", coef="age"),
                             set_prior("normal(0, 1)", coef="gender"),
                             set_prior("horseshoe(1)", class="b")),
                     data=sim_data)

This results in the following error:

Error: Defining priors for single population-level parameters is not allowed when using horseshoe or lasso priors (except for the Intercept).

Please also provide the following information in addition to your question:

  • Operating System: macOS High Sierra 10.13.1
  • brms Version: 2.8.0

Have a look to the model in github, I have done exactly this.

stemangiola/TABI

Thanks @stemangiola. I’m wondering why this is not allowed in brms, @paul.buerkner?

Use brms’ non-linear syntax as follows

bf(y ~ a + b, nl = TRUE) +
  lf(a ~ <predictors with horseshoe prior>, center = TRUE) +
  lf(b ~ 0 + <other predictors>, cmc = FALSE)

prior(horseshoe(1), class = "b", nlpar = "a") +
  prior(normal(0, 1), class = "b", nlpar = "b")
1 Like

Thanks!

This won’t work for ordinal models though (Error: Non-linear formulas are not yet allowed in ordinal models.)

Any workaround for ordinal models with a subset of predictors with horseshoe priors and a subset with a weakly regularizing priors?

No workaround I am aware of. Would you mind opening an issue on https://github.com/paul-buerkner/brms about supporting non-linear formulas in ordinal models?

Will do - thanks for your quick reply.

Thanks again for the quick update to get this working. One follow-up question regarding your code. Shouldn’t the model for a also be run without an intercept and cmc=FALSE? This way, the only intercepts that will be estimated are the K-1thresholds. The way you wrote it, K-1 thresholds will be estimated along with an intercept for a.

You are right, but this is true only for ordinal models whose intercepts (thresholds) are handeled in a special way. In my above proposal, I didn’t know you wanted to use an ordinal model.

Makes sense. Thanks for the clarification

I ran into this problem as well, the error here doesn’t make sense, it is not a single parameter it is the whole class after excluding two parameters.

There’s a different error if you specify the coefficients explicitly, e.g. changing

fit_simulated <- brm(y ~ .,
                     prior=c(set_prior("normal(0, 1)", coef="age"),
                             set_prior("normal(0, 1)", coef="gender"),
                             set_prior("horseshoe(1)", class="b")),
                     data=sim_data)

to

fit_simulated <- brm(y ~ .,
                     prior=c(set_prior("normal(0, 1)", coef="age"),
                             set_prior("normal(0, 1)", coef="gender"),
                             set_prior("horseshoe(1)", class="b", coef = "Gene1")),
                     data=sim_data)

You get a syntax error from RStan

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file61d443929b8a' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for: 

  horseshoe_lpdf(real, int)

Function horseshoe_lpdf not found.
  error in 'model61d465e2b3c9_file61d443929b8a' at line 32, column 37
  -------------------------------------------------
    30:   target += normal_lpdf(b[1] | 0, 1);
    31:   target += normal_lpdf(b[2] | 0, 1);
    32:   target += horseshoe_lpdf(b[3] | 1);
                                            ^
    33:   target += student_t_lpdf(temp_Intercept | 3, 0, 10);
  -------------------------------------------------

You also get a different syntax error if you run

fit_simulated <- brm(y ~ .,
                     prior= set_prior(horseshoe(1), class="b", coef = "Gene1")),
                     data=sim_data)

you get

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file61d428e713e1' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable "df" does not exist.
  error in 'model61d462fa1264_file61d428e713e1' at line 30, column 37
  -------------------------------------------------
    28:   vector[N] mu = temp_Intercept + Xc * b;
    29:   // priors including all constants
    30:   target += horseshoe_lpdf(b[3] | df = 1);
                                            ^
    31:   target += student_t_lpdf(temp_Intercept | 3, 0, 10);
  -------------------------------------------------

Thanks for sharing! Do you have a minimally reproducible example for me to clean up the error messages and make them more informative? I would much appreciate it!

Sure, I think there are 3 potential issues here, I can open one with all 3, or 3 separate, whichever you’d prefer

Opening one issue for all 3 would suffice I think. Thank you!