"Fused" Finnish horseshoe (or any sparsity inducing prior for that matter)

Hello everyone, I am starting to model genetic data, I want to test a modeling assumption, that using a smoothness prior over the regression coefficients in an association study can be a sensible way to go.

In simple and I hope clear term, coefficients with high LD( r parameter in the below model) should be closer together in their effect sizes (Thus allowing us to discern relevant regions of the genome instead of focusing only on the relevant coefficients). Also we know that LD decays exponentially with distance between coefficients, thus I want to incorporate that information in the prior to get both sparsity and smoothness similarly to the fused lasso. I thought on having the distance between two neighbours weighted by their ld as the first angle of attack.

My first naive approach is the model below, taking it from the sparsity tutorial of Michael Betancourt. In which additional to the normal finnish horseshoe I add an smoothness prior to the coefficients. I am worried I have overlooked a Jacobian adjustment (as the compiler warns me). So I would really appreciate your opinion on if the finnish horseshoe is approapiate for this task and if the code below makes sense.

Thank you in advance

data {
  int<lower=1> N; // Number of data
  int<lower=1> M; // Number of covariates
  matrix[M, N] X;
  real y[N];
  vector[M-1] r;
}

// slab_scale = 5, slab_df = 25 -> 8 divergences

transformed data {
  real m0 = 10;           // Expected number of large slopes
  real slab_scale = 3;    // Scale for large slopes
  real slab_scale2 = square(slab_scale);
  real slab_df = 25;      // Effective degrees of freedom for large slopes
  real half_slab_df = 0.5 * slab_df;
}

parameters {
  vector[M] beta_tilde;
  vector<lower=0>[M] lambda;
  real<lower=0> c2_tilde;
  real<lower=0> tau_tilde;
  real alpha;
  real<lower=0> sigma;
}

transformed parameters {
  vector[M] beta;
  {
    real tau0 = (m0 / (M - m0)) * (sigma / sqrt(1.0 * N));
    real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)

    // c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
    // Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
    real c2 = slab_scale2 * c2_tilde;

    vector[M] lambda_tilde =
      sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );

    // beta ~ normal(0, tau * lambda_tilde)
    beta = tau * lambda_tilde .* beta_tilde;
  }
}

model {
  beta_tilde ~ normal(0, 1);
  lambda ~ cauchy(0, 1);
  tau_tilde ~ cauchy(0, 1);
  c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
  
  alpha ~ normal(0, 2);
  sigma ~ normal(0, 2);
  beta[2:N] ~ exponential( r .*(beta[2:N]-beta[1:(N-1)]));
 
  y ~ normal(X' * beta + alpha, sigma);
  
}

What was the warning you got? Did it have to do with beta[1]? Was it intentional to have an improper prior on beta[1]?

If the Jacobian warning is related to this line, you are ok.

beta[2:N] ~ exponential( r .*(beta[2:N]-beta[1:(N-1)]));

Stan recognised that beta is transformed when calculating the log density but selecting a subset of parameters is a linear transformation and the Jabocian is 1 or log(1) = 0 on the log scale.

1 Like