User-friendly implementation of regularised horseshoe

stemangiola · May 22, 2019, 12:26pm

Hello @avehtari and all,

even though I have used regularised horseshoe before, I am getting a bit lost in my code I was wondering if there is a user friendly implementation, not much more complex than

beta ~ double_exponential(0, 1);

The goal is to control better

the percentage of “outliers”
the scale for outliers

Thanks

ldeschamps · May 22, 2019, 12:39pm

Hi!

I like Paul’s function in brms! You can get it when specifying a formula with horseshoe prior and displaying the code with make_stancode.

Cheers,
Lucas

stemangiola · May 22, 2019, 12:42pm

Thanks Lucas I don’t use much brms, would you mind when you have sometime to paste a minimal example?

ldeschamps · May 22, 2019, 12:43pm

No problem, I’ll do that when I open my computer
:)

ldeschamps · May 22, 2019, 1:41pm

If I define the following data and model:

library(brms)
library(rethinking)

## Sample the predictors
X <- cbind(1, rmvnorm2(100, Mu = rep(0,100), sigma = rep(1, 100), Rho = rlkjcorr(1, 100, 1)))
## Sample the parameters
beta <- rnorm(101)

## Compute the linear predictor
mu <- X %*% beta
## Sample the observed values
y <- rnorm(100, mu, sd = 0.5)

## Store everything in a data.frame
Dat <- data.frame(y = y)
Dat$X <- X

## Define the formula
formul <- bf(y ~ X)
## Define the priors
priors <- prior(horseshoe(df = 1, par_ratio = 0.5), class= b)

## DIsplay stan code
make_stancode(formul, data = Dat, prior = priors)

I obtain the following stan code, with the prior being scaled by the residual standard-error.

// generated with brms 2.8.8
functions {

  /* Efficient computation of the horseshoe prior
   * Args:
   *   zb: standardized population-level coefficients
   *   global: global horseshoe parameters
   *   local: local horseshoe parameters
   *   scale_global: global scale of the horseshoe prior
   *   c2: positive real number for regularization
   * Returns:
   *   population-level coefficients following the horseshoe prior
   */
  vector horseshoe(vector zb, vector[] local, real[] global,
                   real scale_global, real c2) {
    int K = rows(zb);
    vector[K] lambda = local[1] .* sqrt(local[2]);
    vector[K] lambda2 = square(lambda);
    real tau = global[1] * sqrt(global[2]) * scale_global;
    vector[K] lambda_tilde = sqrt(c2 * lambda2 ./ (c2 + tau^2 * lambda2));
    return zb .* lambda_tilde * tau;
  }
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  real<lower=0> hs_df;
  real<lower=0> hs_df_global;
  real<lower=0> hs_df_slab;
  real<lower=0> hs_scale_global;
  real<lower=0> hs_scale_slab;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  // local parameters for horseshoe prior
  vector[Kc] zb;
  vector<lower=0>[Kc] hs_local[2];
  real temp_Intercept;  // temporary intercept
  // horseshoe shrinkage parameters
  real<lower=0> hs_global[2];
  real<lower=0> hs_c2;
  real<lower=0> sigma;  // residual SD
}
transformed parameters {
  vector[Kc] b;  // population-level effects
  b = horseshoe(zb, hs_local, hs_global, hs_scale_global * sigma, hs_scale_slab^2 * hs_c2);
}
model {
  vector[N] mu = temp_Intercept + Xc * b;
  // priors including all constants
  target += normal_lpdf(zb | 0, 1);
  target += normal_lpdf(hs_local[1] | 0, 1)
    - 101 * log(0.5);
  target += inv_gamma_lpdf(hs_local[2] | 0.5 * hs_df, 0.5 * hs_df);
  target += student_t_lpdf(temp_Intercept | 3, 3, 10);
  target += normal_lpdf(hs_global[1] | 0, 1)
    - 1 * log(0.5);
  target += inv_gamma_lpdf(hs_global[2] | 0.5 * hs_df_global, 0.5 * hs_df_global);
  target += inv_gamma_lpdf(hs_c2 | 0.5 * hs_df_slab, 0.5 * hs_df_slab);
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_lpdf(Y | mu, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = temp_Intercept - dot_product(means_X, b);
}

If I remove the autoscaling

## Define the priors
priors <- prior(horseshoe(df = 1, par_ratio = 0.5, autoscale = F), class= b)

## DIsplay stan code
make_stancode(formul, data = Dat, prior = priors)

I obtain this code

// generated with brms 2.8.8
functions {

  /* Efficient computation of the horseshoe prior
   * Args:
   *   zb: standardized population-level coefficients
   *   global: global horseshoe parameters
   *   local: local horseshoe parameters
   *   scale_global: global scale of the horseshoe prior
   *   c2: positive real number for regularization
   * Returns:
   *   population-level coefficients following the horseshoe prior
   */
  vector horseshoe(vector zb, vector[] local, real[] global,
                   real scale_global, real c2) {
    int K = rows(zb);
    vector[K] lambda = local[1] .* sqrt(local[2]);
    vector[K] lambda2 = square(lambda);
    real tau = global[1] * sqrt(global[2]) * scale_global;
    vector[K] lambda_tilde = sqrt(c2 * lambda2 ./ (c2 + tau^2 * lambda2));
    return zb .* lambda_tilde * tau;
  }
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  real<lower=0> hs_df;
  real<lower=0> hs_df_global;
  real<lower=0> hs_df_slab;
  real<lower=0> hs_scale_global;
  real<lower=0> hs_scale_slab;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  // local parameters for horseshoe prior
  vector[Kc] zb;
  vector<lower=0>[Kc] hs_local[2];
  real temp_Intercept;  // temporary intercept
  // horseshoe shrinkage parameters
  real<lower=0> hs_global[2];
  real<lower=0> hs_c2;
  real<lower=0> sigma;  // residual SD
}
transformed parameters {
  vector[Kc] b;  // population-level effects
  b = horseshoe(zb, hs_local, hs_global, hs_scale_global, hs_scale_slab^2 * hs_c2);
}
model {
  vector[N] mu = temp_Intercept + Xc * b;
  // priors including all constants
  target += normal_lpdf(zb | 0, 1);
  target += normal_lpdf(hs_local[1] | 0, 1)
    - 101 * log(0.5);
  target += inv_gamma_lpdf(hs_local[2] | 0.5 * hs_df, 0.5 * hs_df);
  target += student_t_lpdf(temp_Intercept | 3, 3, 10);
  target += normal_lpdf(hs_global[1] | 0, 1)
    - 1 * log(0.5);
  target += inv_gamma_lpdf(hs_global[2] | 0.5 * hs_df_global, 0.5 * hs_df_global);
  target += inv_gamma_lpdf(hs_c2 | 0.5 * hs_df_slab, 0.5 * hs_df_slab);
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_lpdf(Y | mu, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = temp_Intercept - dot_product(means_X, b);
}

Cheers,
Lucas

stemangiola · May 22, 2019, 1:46pm

Thanks,

I wish there was a one line implementation of this ;)

bgoodri · May 22, 2019, 2:03pm

The one-line implementation tends to yield divergent transitions.

avehtari · May 22, 2019, 4:46pm

brms implementation follows what I recommend, too. It has the additional benefit of using non-centered parameterization, which reduces the probability of getting funnel shaped posterior which is more difficult to sample.

stemangiola · May 22, 2019, 9:39pm

Just to confirm,

the above implementation of reg_horseshoe does (or does not) have arguments as parameters?

It is analogous of (?)

~ double exponential(0,1)

or

~ double exponential(alpha, beta);
alpha ~ ..
beta ~ ..

I am looking for the case (1) where I impose the arguments and not trying to infer them.

For example, can I set the following parameters?, based on:

expected fraction of “putliers”
extected scale of outliers

  // local parameters for horseshoe prior
  vector<lower=0>[Kc] hs_local[2];

  // horseshoe shrinkage parameters
  real<lower=0> hs_global[2];
  real<lower=0> hs_c2;

I have compressed the code and moved to functions, and kept just the key three tuning arguments (In my option but I can be wrong), decreasing of more than half the number of code lines in the main body of the model.

  functions{
  
  	 vector horseshoe_get_tp(vector zb, vector[] local, real[] global, real scale_global, real c2) {
  	    int K = rows(zb);
  	    vector[K] lambda = local[1] .* sqrt(local[2]);
  	    vector[K] lambda2 = square(lambda);
  	    real tau = global[1] * sqrt(global[2]) * scale_global;
  	    vector[K] lambda_tilde = sqrt(c2 * lambda2 ./ (c2 + tau^2 * lambda2));
  	    return zb .* lambda_tilde * tau;
  	  }
  
    real horseshoe_get_lp(vector zb, vector[] local, real df, real[] global, real df_global, real c2, real df_slab){
  
      // real<lower=0> hs_df; // == 1  // If divergencies increase this
  	  // real<lower=0> hs_df_global; // == 1
  	  // real<lower=0> hs_df_slab; // == 4 // df of the outliers
  
    	vector[6] lp;
  
    	lp[1] = normal_lpdf(zb | 0, 1);
  	  lp[2] = normal_lpdf(local[1] | 0, 1) - 101 * log(0.5);
  	  lp[3] = inv_gamma_lpdf(local[2] | 0.5 * df, 0.5 * df);
  	  lp[4] = normal_lpdf(global[1] | 0, 1)  - 1 * log(0.5);
  	  lp[5] = inv_gamma_lpdf(global[2] | 0.5 * df_global, 0.5 * df_global);
  	  lp[6] = inv_gamma_lpdf(c2 | 0.5 * df_slab, 0.5 * df_slab);
  
  	  return(sum(lp));
    }
  }
  data{
  int N;
  /// Horseshoe tuning
    real<lower=0> hs_df; // == 1  // If divergencies increase this
    real<lower=0, upper=1> par_ratio; // real<lower=0> hs_scale_global; // from par ratio   !! KEY PARAMETER
    real<lower=0> hs_scale_slab; // == 2 // regularisation/scale of outliers                !! KEY PARAMETER
  }
  parameters{
    vector[1] zbeta;
    // Horseshoe
    vector<lower=0>[1] hs_local[2]; // local parameters for horseshoe prior
    real<lower=0> hs_global[2]; // horseshoe shrinkage parameters
    real<lower=0> hs_c2; // horseshoe shrinkage parameters
  }
  transformed parameters {
  	// Horseshoe
  	vector[1] beta= horseshoe_get_tp(zbeta, hs_local, hs_global, par_ratio / sqrt(N), hs_scale_slab^2 * hs_c2);
  
  }
  model{
         // Horseshoe
  	target += horseshoe_get_lp(zbeta, hs_local, hs_df, hs_global, 1, hs_c2, 4);
  }

avehtari · May 23, 2019, 10:33am

In the above code the parameters are given as data

  real<lower=0> hs_df;
  real<lower=0> hs_df_global;
  real<lower=0> hs_df_slab;
  real<lower=0> hs_scale_global;
  real<lower=0> hs_scale_slab;

See Sparsity information and regularization in the horseshoe and other shrinkage priors how these different parameters affect the distribution.

Yes, see Sparsity information and regularization in the horseshoe and other shrinkage priors how to set the above parameters based on that information.

stemangiola · August 14, 2019, 12:49pm

Thanks for the article. I think brms had a nice description as well (please correct me if there is any misunderstanding)

par_ratio = Ratio of the expected number of non-zero coefficients to the expected number of zero coefficients. So if i expect 1% of non-zero coefficient this ratio will be 1/99
hs_scale_slab = the expected value of the non-zero coefficient. For example if I expect the coefficient all being 0 except few being 2; 2 will be the value of this parameter

Topic		Replies	Views
Horseshoe prior for logistic regression in brms brms	0	1029	July 12, 2018
How to extent (regularized) horseshoe prior to multiple outcome setting? Modeling regularization	28	3669	May 8, 2020
Bayes Sparse Regression - reg. horseshoe for multinomial model Modeling techniques	17	4469	August 28, 2019
Getting bad results with horseshoe prior Modeling fitting-issues	15	2363	May 1, 2018
Horseshoe prior with diabetes dataset Modeling fitting-issues	4	3331	May 12, 2017

User-friendly implementation of regularised horseshoe

Related topics