Hello everyone, I am starting to model genetic data, I want to test a modeling assumption, that using a smoothness prior over the regression coefficients in an association study can be a sensible way to go.
In simple and I hope clear term, coefficients with high LD( r parameter in the below model) should be closer together in their effect sizes (Thus allowing us to discern relevant regions of the genome instead of focusing only on the relevant coefficients). Also we know that LD decays exponentially with distance between coefficients, thus I want to incorporate that information in the prior to get both sparsity and smoothness similarly to the fused lasso. I thought on having the distance between two neighbours weighted by their ld as the first angle of attack.
My first naive approach is the model below, taking it from the sparsity tutorial of Michael Betancourt. In which additional to the normal finnish horseshoe I add an smoothness prior to the coefficients. I am worried I have overlooked a Jacobian adjustment (as the compiler warns me). So I would really appreciate your opinion on if the finnish horseshoe is approapiate for this task and if the code below makes sense.
Thank you in advance
data {
int<lower=1> N; // Number of data
int<lower=1> M; // Number of covariates
matrix[M, N] X;
real y[N];
vector[M-1] r;
}
// slab_scale = 5, slab_df = 25 -> 8 divergences
transformed data {
real m0 = 10; // Expected number of large slopes
real slab_scale = 3; // Scale for large slopes
real slab_scale2 = square(slab_scale);
real slab_df = 25; // Effective degrees of freedom for large slopes
real half_slab_df = 0.5 * slab_df;
}
parameters {
vector[M] beta_tilde;
vector<lower=0>[M] lambda;
real<lower=0> c2_tilde;
real<lower=0> tau_tilde;
real alpha;
real<lower=0> sigma;
}
transformed parameters {
vector[M] beta;
{
real tau0 = (m0 / (M - m0)) * (sigma / sqrt(1.0 * N));
real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)
// c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
// Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
real c2 = slab_scale2 * c2_tilde;
vector[M] lambda_tilde =
sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );
// beta ~ normal(0, tau * lambda_tilde)
beta = tau * lambda_tilde .* beta_tilde;
}
}
model {
beta_tilde ~ normal(0, 1);
lambda ~ cauchy(0, 1);
tau_tilde ~ cauchy(0, 1);
c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
alpha ~ normal(0, 2);
sigma ~ normal(0, 2);
beta[2:N] ~ exponential( r .*(beta[2:N]-beta[1:(N-1)]));
y ~ normal(X' * beta + alpha, sigma);
}