The myth of log_prob from Stan model?

Hi Stan team and community.

I have been applying BLACK BOX STOCHASTIC VARIATIONAL INFERENCE from viabel gitshub using PyStan + Python.

I came across the Parameter regularization using Finnished Horseshoes prior in estimating Linear Regression. Below are the Stan code to do so.

data {
  int<lower=1> N; // Number of data
  int<lower=1> M; // Number of covariates
  matrix[M, N] X;
  real y[N];
}

// slab_scale = 5, slab_df = 25 -> 8 divergences

transformed data {
  real m0 = 5;           // Expected number of large slopes
  real slab_scale = 3;    // Scale for large slopes
  real slab_scale2 = square(slab_scale);
  real slab_df = 25;      // Effective degrees of freedom for large slopes
  real half_slab_df = 0.5 * slab_df;
}

parameters {
  vector[M] beta_tilde;
  vector<lower=0>[M] lambda;
  real<lower=0> c2_tilde;
  real<lower=0> tau_tilde;
  real<lower=0> sigma;
}

transformed parameters {
  vector[M] beta;
  {
    real tau0 = (m0 / (M - m0)) * (sigma / sqrt(1.0 * N));
    real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)

    // c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
    // Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
    real c2 = slab_scale2 * c2_tilde;

    vector[M] lambda_tilde =
      sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );

    // beta ~ normal(0, tau * lambda_tilde)
    beta = tau * lambda_tilde .* beta_tilde;
  }
}

model {
  beta_tilde ~ normal(0, 1);
  lambda ~ cauchy(0, 1);
  tau_tilde ~ cauchy(0, 1);
  c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
  sigma ~ normal(0, 2);

  y ~ normal(X' * beta , sigma);
}

As you can see the constrained_param_names() from Stan code above are beta_tilde, lambda, c2_tilde , tau_tilde and sigma (all are non-negative values except for beta_tilde).

Then i tried to use log_prob with the same order as in Stan4class.model_pars

I am so comfused how log_prob from Stan fit is computed since the samples that i have used contained for instance negative value of lambda, c2_tilde and so forth but the log_prob is still able to compute?

DOes log_prob compute the joint distribution of all parameters including likelihood?

if so how come the negative value of those non negative parameters are able to produce log_prob from the start?

I am looking forward to more details of how it is computed.

Any knowledge is appreciated,
Happy Quarantined Halloween!

Best

Patt.

grad_log_prob and log_prob in rstan work on the unconstrained scale.

Every parameter in Stan is actually represented on an unconstrained (-\infty, \infty) scale, and the constraints are achieved via change of variables (described here)

When you pass in parameters you’re working with the unconstrained scale.

I remember a bug with log_prob itself recently: https://github.com/stan-dev/rstan/issues/834, so if you see these functions doing something weird definitely ask.

Also to preformat code so it shows up nicely, surround it with three backticks (three ` before and three after).

Here is small example

1 Like

Thank you now it is almost clear to me why it is able to produce log_prob… and noted for ‘’’ on the code lines.

@ahartikainen Thank you for your reply i will take a look on the older post and wish i could pass this knowledge to other people who have the similar question as mine.

Just remember that depending on what you’re doing you may need to include the Jacobian adjustment in these calculations.

If you’re writing some sort of probabilistic thing using Stan gradients (and I assume you are since you mention variation inference), you’ll want this adjustment. It is the Jacobian term in the change of variables.

If the sampler was on the constrained scale, you could work with:

p(constrained)

Since they are on the unconstrained scale, with constrained = f^{-1}(unconstrained), you’ll want to work with

p(f^{-1}(unconstrained)) det(Jac(f^{-1}(unconstrained)))

like in the manual.

I just say this verbosely cause everyone ends up forgetting this and making a mistake. I’ve done it for sure.

3 Likes

Oh yes. I updated my example to show how this is done.

1 Like

@ahartikainen Thank you again for the clear explanation on how log_prob is produced through the “unconstrained” parameters.

One more thing could you please tell me how exactly that lp[0] is computed?
There are two parameters in the example a ~ normal(0,1), b~ normal(0,2)
is it from log joint pdf of p(a,b) ? without constaint term in normal distribution?

I ask cause i tried it but it did not deliever lp[0] as shown in the example even i used the unconstrained version of a parameter.

Best,
Patt

I am just adding this case study that explains the effect of jacobian. The moral, according to Bob, is full Bayesian inference is insensitive to parameterization with an appropriate Jacobian adjustment (unlike mode).

1 Like