The myth of log_prob from Stan model?

paponpat503 · October 16, 2020, 1:54pm

Hi Stan team and community.

I have been applying BLACK BOX STOCHASTIC VARIATIONAL INFERENCE from viabel gitshub using PyStan + Python.

I came across the Parameter regularization using Finnished Horseshoes prior in estimating Linear Regression. Below are the Stan code to do so.

data {
  int<lower=1> N; // Number of data
  int<lower=1> M; // Number of covariates
  matrix[M, N] X;
  real y[N];
}

// slab_scale = 5, slab_df = 25 -> 8 divergences

transformed data {
  real m0 = 5;           // Expected number of large slopes
  real slab_scale = 3;    // Scale for large slopes
  real slab_scale2 = square(slab_scale);
  real slab_df = 25;      // Effective degrees of freedom for large slopes
  real half_slab_df = 0.5 * slab_df;
}

parameters {
  vector[M] beta_tilde;
  vector<lower=0>[M] lambda;
  real<lower=0> c2_tilde;
  real<lower=0> tau_tilde;
  real<lower=0> sigma;
}

transformed parameters {
  vector[M] beta;
  {
    real tau0 = (m0 / (M - m0)) * (sigma / sqrt(1.0 * N));
    real tau = tau0 * tau_tilde; // tau ~ cauchy(0, tau0)

    // c2 ~ inv_gamma(half_slab_df, half_slab_df * slab_scale2)
    // Implies that marginally beta ~ student_t(slab_df, 0, slab_scale)
    real c2 = slab_scale2 * c2_tilde;

    vector[M] lambda_tilde =
      sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );

    // beta ~ normal(0, tau * lambda_tilde)
    beta = tau * lambda_tilde .* beta_tilde;
  }
}

model {
  beta_tilde ~ normal(0, 1);
  lambda ~ cauchy(0, 1);
  tau_tilde ~ cauchy(0, 1);
  c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
  sigma ~ normal(0, 2);

  y ~ normal(X' * beta , sigma);
}

As you can see the constrained_param_names() from Stan code above are beta_tilde, lambda, c2_tilde , tau_tilde and sigma (all are non-negative values except for beta_tilde).

Then i tried to use log_prob with the same order as in Stan4class.model_pars

I am so comfused how log_prob from Stan fit is computed since the samples that i have used contained for instance negative value of lambda, c2_tilde and so forth but the log_prob is still able to compute?

DOes log_prob compute the joint distribution of all parameters including likelihood?

if so how come the negative value of those non negative parameters are able to produce log_prob from the start?

I am looking forward to more details of how it is computed.

Any knowledge is appreciated,
Happy Quarantined Halloween!

Best

Patt.

bbbales2 · October 16, 2020, 2:32pm

grad_log_prob and log_prob in rstan work on the unconstrained scale.

Every parameter in Stan is actually represented on an unconstrained (-\infty, \infty) scale, and the constraints are achieved via change of variables (described here)

When you pass in parameters you’re working with the unconstrained scale.

I remember a bug with log_prob itself recently: https://github.com/stan-dev/rstan/issues/834, so if you see these functions doing something weird definitely ask.

Also to preformat code so it shows up nicely, surround it with three backticks (three ` before and three after).

ahartikainen · October 16, 2020, 2:33pm

Here is small example

gist.github.com

https://gist.github.com/ahartikainen/8713171d259718cf737d8a483500e0c2

pystan_logprob.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pystan\n",
    "import numpy as np"

This file has been truncated. show original

paponpat503 · October 16, 2020, 5:36pm

Thank you now it is almost clear to me why it is able to produce log_prob… and noted for ‘’’ on the code lines.

@ahartikainen Thank you for your reply i will take a look on the older post and wish i could pass this knowledge to other people who have the similar question as mine.

bbbales2 · October 16, 2020, 5:42pm

Just remember that depending on what you’re doing you may need to include the Jacobian adjustment in these calculations.

If you’re writing some sort of probabilistic thing using Stan gradients (and I assume you are since you mention variation inference), you’ll want this adjustment. It is the Jacobian term in the change of variables.

If the sampler was on the constrained scale, you could work with:

p(constrained)

Since they are on the unconstrained scale, with constrained = f^{-1}(unconstrained), you’ll want to work with

p(f^{-1}(unconstrained)) det(Jac(f^{-1}(unconstrained)))

like in the manual.

I just say this verbosely cause everyone ends up forgetting this and making a mistake. I’ve done it for sure.

ahartikainen · October 16, 2020, 7:29pm

Oh yes. I updated my example to show how this is done.

paponpat503 · December 7, 2020, 7:01pm

@ahartikainen Thank you again for the clear explanation on how log_prob is produced through the “unconstrained” parameters.

One more thing could you please tell me how exactly that lp[0] is computed?
There are two parameters in the example a ~ normal(0,1), b~ normal(0,2)
is it from log joint pdf of p(a,b) ? without constaint term in normal distribution?

I ask cause i tried it but it did not deliever lp[0] as shown in the example even i used the unconstrained version of a parameter.

Best,
Patt

hyunji.moon · December 7, 2020, 8:26pm

I am just adding this case study that explains the effect of jacobian. The moral, according to Bob, is full Bayesian inference is insensitive to parameterization with an appropriate Jacobian adjustment (unlike mode).

Topic		Replies	Views
About Pystan Convergence Issue Modeling specification	6	1114	August 9, 2020
Stan doesn't like (near-) zeros? Modeling pystan	10	4138	November 17, 2021
About Pystan Convergence Issue Modeling	1	579	August 13, 2020
How to evaluate log posterior probabilities at certain parameter values? Modeling techniques	2	869	December 15, 2021
Unexpected log_prob() behavior Developers fitting-issues	5	615	April 20, 2023

The myth of log_prob from Stan model?

Related topics