Impact of the par_ratio option of the horseshoe prior

Hi,

I have a question with respect to the specification of the horseshoe prior.
If I set the par_ratio option to a specific value X = expected number of non-zero coefficients to zero coefficients, will this force the model to have the provided number of non-zero coefficients or is the effect that due to the specification of the prior this number of non-zero coefficients has the chance to escape shrinkage?

The second question is with respect to counting the number of coefficients.
If I have a categorical model, as well as predictors with several different categorical levels, the model learns separately coefficients for every level of predictor and outcome category (minus the reference levels). Is this correct? That means that if I set the par_ratio option of the horseshoe prior, I have to keep in mind, that this is not about how many predictors I expect to have an influence on the outcome, but about all combinations of predictor levels and outcome categories?

Thanks in advance!

If I set the par_ratio option to a specific value X = expected number of non-zero coefficients to zero coefficients, will this force the model to have the provided number of non-zero coefficients or is the effect that due to the specification of the prior this number of non-zero coefficients has the chance to escape shrinkage?

Welcome! The model isn’t forced to have exactly this number of non-zero coefficients, it’s just a prior.
Typically the horseshoe prior is also not very sensitive to the expected number of non-zero coefficients, so it’s usually enough to have a reasonable guess.

Some background info in case it’s useful:

If your regression coefficients are called \beta_j, then the horseshoe prior is defined as

\beta_j \sim \text{Normal}(0, \tau^{2}\lambda_{j}^{2})
\lambda_{j} \sim \text{C}^{+}(0, 1)
\tau \sim \text{?}

, where \tau is called the “global” shrinkage parameter and \lambda_{j} are called the “local” shrinkage parameters. A useful intuition mentioned in the regularized horseshoe paper by Piironen and Vehtari https://arxiv.org/pdf/1707.01694.pdf is that the global shrinkage parameter pulls all the estimates towards 0, while the local shrinkage parameters with the half-cauchy priors allow some coefficients to escape that shrinkage.

The prior on \tau has historically just been \tau \sim \text{C}^{+}(0, 1), but in the regularized horseshoe paper, Piironen and Vehtari show that this prior is usually much too wide, and a better prior would be \tau \sim \text{C}^{+}(0, (\text{par_ratio} \frac{1}{\sqrt{N}})^{2}) (when the residual standard deviation is 1).

The reason for that is a bit lengthy to explain here, but the short version is that you can define something called the “shrinkage factor” \kappa_{j}, which describes how much each \beta_{j} is shrunk away from the maximum likelihood solution and towards zero.
This shrinkage factor has \tau in its definition, and if we want put a prior on the number of effective non-zero coefficients (which is the sum over all 1 - \kappa_{j}), then the prior on \tau should have that special form.

The second question is with respect to counting the number of coefficients.
If I have a categorical model, as well as predictors with several different categorical levels, the model learns separately coefficients for every level of predictor and outcome category (minus the reference levels). Is this correct? That means that if I set the par_ratio option of the horseshoe prior, I have to keep in mind, that this is not about how many predictors I expect to have an influence on the outcome, but about all combinations of predictor levels and outcome categories?

Unfortunately I can’t really help you with your second question. Your reasoning sounds correct to me (one regression coefficient per level minus 1), but I’m unsure how brms handles the pseudo variance for more than two possible outcomes, so someone else has to chime in.

1 Like

par_ratio replaces p_0/(D-p_0) in eq (3.12) and defined \tau_0 is used as a parameter for half-Cauchy prior for \tau and the effect is that there is quite wide prior over m_\mathrm{eff} as shown in Figure 3 in Sparsity information and regularization in the horseshoe and other shrinkage priors.

Oops, I just noticed that @daniel_h answered while I was typing my answer.

Thanks to both of you for answering my questions!