Hi Stan team and community.

I have been applying BLACK BOX STOCHASTIC VARIATIONAL INFERENCE from viabel gitshub using PyStan + Python.

I came across the Parameter regularization using Finnished Horseshoes prior in estimating Linear Regression. Below are the Stan code to do so.

data {

int<lower=1> N; // Number of data

int<lower=1> M; // Number of covariates

matrix[M, N] X;

real y[N];

}

// slab_scale = 5, slab_df = 25 -> 8 divergences

transformed data {

real m0 = 5; // Expected number of large slopes

real slab_scale = 3; // Scale for large slopes

real slab_scale2 = square(slab_scale);

real slab_df = 25; // Effective degrees of freedom for large slopes

real half_slab_df = 0.5 * slab_df;

}

parameters {

vector[M] beta_tilde;

vector<lower=0>[M] lambda;

real<lower=0> c2_tilde;

real<lower=0> tau_tilde;

real<lower=0> sigma;

}

transformed parameters {

vector[M] beta;

{

real tau0 = (m0 / (M - m0)) * (sigma / sqrt(1.0 * N));

real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)

```
// c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
// Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
real c2 = slab_scale2 * c2_tilde;
vector[M] lambda_tilde =
sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );
// beta ~ normal(0, tau * lambda_tilde)
beta = tau * lambda_tilde .* beta_tilde;
```

}

}

model {

beta_tilde ~ normal(0, 1);

lambda ~ cauchy(0, 1);

tau_tilde ~ cauchy(0, 1);

c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);

sigma ~ normal(0, 2);

y ~ normal(X’ * beta , sigma);

}

As you can see the constrained_param_names() from Stan code above are beta_tilde, lambda, c2_tilde , tau_tilde and sigma (all are non-negative values except for beta_tilde).

Then i tried to use log_prob with the same order as in Stan4class.model_pars

I am so comfused how log_prob from Stan fit is computed since the samples that i have used contained for instance negative value of lambda, c2_tilde and so forth but the log_prob is still able to compute?

DOes log_prob compute the joint distribution of all parameters including likelihood?

if so how come the negative value of those non negative parameters are able to produce log_prob from the start?

I am looking forward to more details of how it is computed.

Any knowledge is appreciated,

Happy Quarantined Halloween!

Best

Patt.