Hi Stan team and community.
I have been applying BLACK BOX STOCHASTIC VARIATIONAL INFERENCE from viabel gitshub using PyStan + Python.
I came across the Parameter regularization using Finnished Horseshoes prior in estimating Linear Regression. Below are the Stan code to do so.
data {
int<lower=1> N; // Number of data
int<lower=1> M; // Number of covariates
matrix[M, N] X;
real y[N];
}
// slab_scale = 5, slab_df = 25 -> 8 divergences
transformed data {
real m0 = 5; // Expected number of large slopes
real slab_scale = 3; // Scale for large slopes
real slab_scale2 = square(slab_scale);
real slab_df = 25; // Effective degrees of freedom for large slopes
real half_slab_df = 0.5 * slab_df;
}
parameters {
vector[M] beta_tilde;
vector<lower=0>[M] lambda;
real<lower=0> c2_tilde;
real<lower=0> tau_tilde;
real<lower=0> sigma;
}
transformed parameters {
vector[M] beta;
{
real tau0 = (m0 / (M - m0)) * (sigma / sqrt(1.0 * N));
real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)
// c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
// Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
real c2 = slab_scale2 * c2_tilde;
vector[M] lambda_tilde =
sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );
// beta ~ normal(0, tau * lambda_tilde)
beta = tau * lambda_tilde .* beta_tilde;
}
}
model {
beta_tilde ~ normal(0, 1);
lambda ~ cauchy(0, 1);
tau_tilde ~ cauchy(0, 1);
c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
sigma ~ normal(0, 2);
y ~ normal(X' * beta , sigma);
}
As you can see the constrained_param_names() from Stan code above are beta_tilde, lambda, c2_tilde , tau_tilde and sigma (all are non-negative values except for beta_tilde).
Then i tried to use log_prob with the same order as in Stan4class.model_pars
I am so comfused how log_prob from Stan fit is computed since the samples that i have used contained for instance negative value of lambda, c2_tilde and so forth but the log_prob is still able to compute?
DOes log_prob compute the joint distribution of all parameters including likelihood?
if so how come the negative value of those non negative parameters are able to produce log_prob from the start?
I am looking forward to more details of how it is computed.
Any knowledge is appreciated,
Happy Quarantined Halloween!
Best
Patt.