I see,

I have a different perspective (maybe numerically equivalent). In my case M = number of genes. My data is organised by counts of sequenced molecules (genes). Mine is not organised as classification problem, but a series (n~20K) of (multiple or single) regressions, in my case with a continuous covariate.

For example, 20000 of these

Now, ignore this spline regression (old attempt), if I use a linear function (my final goal is to use a generalised sigmoid, but it changes little for my question)

gene1 -> [y1_1,…,y1_T] = alpha1 + beta1 * [x,…,x]

gene2 -> [y2_1,…,y2_T] = alpha2 + beta2 * [x,…,x]

gene3 -> [y3_1,…,y3_T] = alpha3 + beta3 * [x,…,x]

gene4 -> [y4_1,…,y4_T] = alpha4 + beta4 * [x,…,x]

gene5 -> [y5_1,…,y5_T] = alpha5 + beta5 * [x,…,x]

I would like to have a sparsity assumption on beta, and maybe using your horseshoe I could achieve that

So:

```
beta ~ finnish_horseshoe(.., .., .., ..)
```

rather than using

```
beta ~ normal(0, ..)
```

and:

- improve my false positive rate. With little replication (e.g., 13 patients rather than 500) I will obtain a lot of noisy results, that I don’t want to be distracted from and cannot easily filtered out, choosing an arbitrary threshold
- identify a tight cluster of fixed genes that will provide normalisation for my patients (see below, I use multinomial(softmax( rates with non changing genes anchors )) framework )

Do you think this would in principle invalidate the use of your prior, because the paradigm is not anymore LASSO/classification/feature selection?

I hope the following model of mine is not distracting from the first question. In the case of your

```
real tau0 = (m0 / (G - m0)) * (**sigma** / sqrt(1.0 * N));
```

I don’t have **sigma** as I use multinomial with softmax *(with a biological motivation being gene expression is not a real simplex as there are not limited resources, so genes do not compete in that sense, and the changes in transcriptions can be asymmetric, meaning there could be just increases for a set of genes while the others remain unchanged)*

```
data {
int<lower = 0> G; // all genes
int<lower = 0> T; // tube/patients/sample/replicates synonimous
int<lower = 0> y[T, G]; // RNA-seq counts
matrix[T,2] X; // design matrix, intercept and one slope
}
// Horseshoe
transformed data {
real m0 = 30; // Expected number of large slopes
real slab_scale = 5; // Scale for large slopes
real slab_scale2 = square(slab_scale);
real slab_df = 25; // Effective degrees of freedom for large slopes
real half_slab_df = 0.5 * slab_df;
}
parameters {
// Linear model
row_vector[G] beta0;
// Horseshoe
vector[G] beta_tilde;
vector<lower=0>[G] lambda;
real<lower=0> c2_tilde;
real<lower=0> tau_tilde;
}
transformed parameters{
matrix[2, G] beta; // 2 coefficients, intercept and slope
row_vector[G] beta1; // one covariate
{
real tau0 = (m0 / (G - m0)) * (sigma / sqrt(1.0 * N)); // !!! I do NOT have sigma
real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)
real c2 = slab_scale2 * c2_tilde;
vector[G] lambda_tilde = sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );
beta1[1] = tau * lambda_tilde .* beta_tilde;
}
beta = append_row(beta0, beta1);
}
model {
// Horseshoe
beta_tilde ~ normal(0, 1);
lambda ~ cauchy(0, 1);
tau_tilde ~ cauchy(0, 1);
c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
// Lnear system
beta[1] ~ normal(0,1);
for (t in 1:T)
y[t] ~ multinomial( softmax( X[t] * beta ) );
}
```

Finally, I cannot easily match the Michael implementation (Which I like because it treats explicitely the two key variables

```
real m0 = 10; // Expected number of large slopes
real slab_scale = 3; // Scale for large slopes
```

)

https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html

With the implementation on your article, for which sigma is not part of the transformed parameter formulas

https://projecteuclid.org/euclid.ejs/1513306866