Structured sparsity, grouped horseshoe

Sorry for the long post - please let me know if this format of question is welcome here or if it needs to be more on topic.
My fundamental question is: is there any literature or experiences on the topic of modeling structured sparsity as described here?

I’m currently trying to model longitudinal data using a random intercept + slope mixed linear model with lots of candidate predictors. These predictors are structured in a tree-like way - think of different measurement platforms (layer 1) each measuring different marker (individual predictors, layer 3). Most of these markers belong to non-overlapping groups (layer 2), defined by domain specific knowledge and manifested in strong correlations. I.e. we have a coarse grouping (layer 1), a finer grouping (layer 2 nested within layer 1) and the individual predictors (layer 3 nested within layer 2).

The goal is to find important predictors.

Due to the complexity of this task, I decided to give bayesian modeling a try. Several possible approaches come to mind, focusing on the modeling of layers 2 and 3 for now:

  1. The base model: ignore the structure and just model layer 3. I apply the regularized horseshoe for this.
  2. Grouped horseshoe 1: implement each coefficient as \beta_j \sim N(0, \tau \lambda_{G_j} \lambda_j) where \tau and \lambda_j are the usual global-local scales of the regularized horseshoe and \lambda_{G_j} should serve as intermediary scale for the group of predictor j.
  3. Grouped horseshoe 2: as suggested by avehtari here , the local scales could have group specific parameter.
  4. Multivariate Horseshoe: suggested here. Would allow to let coefficients from a group be correlated.

My intuitions are the following:

  • Model 2 allows some groups being shrunk less than others, making it predictors from this group easier to contribute to the predictions.
  • Model 2 and 3 should be mostly equivalent, with model 2 being conceptually simpler to generalize to include another layer too.
  • Model 4 seems too complicated for a large number of covariates, with too many parameters.

I have already compared Models 1 and 2, with preliminary results agreeing to my prior thoughts:
Model 2 seems to favor a certain group (smallest group shrinkage) and thus more predictors from this group end up in the top X list of predictors, compared to Model 1.

My remaining questions:

  1. Do you see any fundamental issues with Model 2?
  2. Is there an “easy” way to “push” the group effect onto a single representative from a group? This could probably be done using pca per group or similar, but the real goal would be to only end up with a single (or very few) candidates per group rather than having to measure all of them.

Thanks for looking!

Hi @matherealize,

This is not much researched topic. There are related group lasso and network lasso papers, and some of the ideas might work with horseshoe, too. Much of the literature focuses on cases where the groups are unknown which makes the problem much harder, but if you know the groups you could do something like this