Defining a custom lpdf for the Normal-Laplace convoluted density in Stan

hwilde · January 19, 2020, 1:48pm

I am trying to define the Normal-Laplace density as stated here at the bottom of Page 5 https://arxiv.org/pdf/1712.07216.pdf but also supporting a location in the Normal component. I will attach my code here, but find the examples in the documentation hard to understand in this context so there may be some obvious issues with it:

functions {
   real mills_ratio(real z) {
      // return (1 - normal_cdf(z, 0, 1)) / exp(normal_lpdf(z | 0, 1)); 
      return (1 - Phi_approx(z)) / exp(std_normal_lpdf(z)); 
   }

   real normal_laplace_lpdf(vector y, real mu, real sigma, real scale) {
      real total = 0.0;
      for (i in 1:num_elements(y)) {
         total += log((1 / sqrt(2) * scale) * std_normal_lpdf((y[i] - mu) / sigma) * (mills_ratio((sqrt(2) * sigma) / scale - (y[i] - mu) / sigma) + mills_ratio((sqrt(2) * sigma) / scale + (y[i] - mu) / sigma)));
      }
      return total;
   }
}

I am looking to call it like this:

y2 ~ normal_laplace(mu, sqrt(sigma2), scale);

Currently this results in the following working with pystan:

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

y2 is an input of samples taken from a Normal(5,2) distribution with some Laplace(0,1.5) noise applied to them, scale is given as a known quantity to the stan model in the data section.

Any help would be much appreciated.

bgoodri · January 19, 2020, 3:27pm

Part of the problem is that you are taking the logarithm of the standard normal log-PDF rather than the logarithm of the standard normal PDF.

nhuurre · January 19, 2020, 3:29pm

I think the problem is that you have std_normal_lpdf((y[i] - mu) / sigma) where it should be exp(std_normal_lpdf((y[i] - mu) / sigma)).

I recommend calculating everything on log scale, like this

functions {
   real log_mills_ratio(real z) {
      return log1m(Phi(z)) - std_normal_lpdf(z); 
   }

   real normal_laplace_lpdf(vector y, real mu, real sigma, real scale) {
      real total = 0.0;
      real k = (sqrt(2) * sigma) / scale;
      for (i in 1:num_elements(y)) {
         real r = (y[i] - mu) / sigma;
         total += -log(sqrt(2) * scale)
                + std_normal_lpdf(r)
                + log_sum_exp(log_mills_ratio(k - r),
                              log_mills_ratio(k + r));
      }
      return total;
   }
}

hwilde · January 20, 2020, 8:46am

Thank you for the great help, I totally missed that exp, this does now work at least in part, my experiment involves varying the proportion of contaminated data and the laplace scale, i can now get through a few parameter configurations before seeing this:

…
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
multiprocessing.pool.RemoteTraceback:
“”"
Traceback (most recent call last):
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 121, in worker
result = (True, func(*args, **kwds))
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 44, in mapstar
return list(map(*args))
File “stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565.pyx”, line 373, in stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565._call_sampler_star
File “stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565.pyx”, line 406, in stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565._call_sampler
RuntimeError: Initialization failed.
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “run.py”, line 90, in
unseen_data, ytildes, klmodel, betamodel, knownmodel, args.cpu_count, noise_config[0], *prior_config)
File “/Users/Library/Mobile Documents/com~apple~CloudDocs/PhD/Synthetic Data/experiment.py”, line 41, in run_experiment
knownfit = knownmodel.sampling(data=data, warmup=warmup, iter=num_iter, chains=chains, n_jobs=cpu_count)
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/site-packages/pystan/model.py”, line 813, in sampling
ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/site-packages/pystan/model.py”, line 85, in _map_parallel
map_result = pool.map(function, args)
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 657, in get
raise self._value
RuntimeError: Initialization failed.

Do you have any further advice for troubleshooting this. I am currently using the code you suggested for my normal_laplace_lpdf

nhuurre · January 21, 2020, 7:25pm

So the log probability is finite but it’s gradient is undefined? Ouch, that sounds rather tricky. My best guess is that the ratio calculation under flows. If so, this check will print a warning

real log_mills_ratio(real z) {
  real lr = log1m(Phi(z)) - std_normal_lpdf(z); 
  if (lr < -1e30) {
    print("Mills ratio failed! ", z, " -> ", lr);
  }
  return lr;
}

hwilde · January 21, 2020, 8:34pm

Yep I am getting a lot of -Infs here, do you have any ideas for how to fix? Some sample output:

Phi(z): 1
Std Norm lpdf: -177.215
Mills ratio failed! 18.7774 → -inf
Phi(z): 1
Std Norm lpdf: -198.119
Mills ratio failed! 19.8595 → -inf
Phi(z): 1
Std Norm lpdf: -598.526
Mills ratio failed! 34.5719 → -inf
Phi(z): 1
Std Norm lpdf: -343.762
Mills ratio failed! 26.1856 → -inf

These values don’t fill me with confidence.

nhuurre · January 21, 2020, 8:54pm

log(Phi(-z)) should be more stable than (but otherwise equal to) log1m(Phi(z)).

The issue probably only hurts initialization. Once the sampler finds the correct posterior the errors should disappear.
Standardizing your data and parameters may help initialization. The idea is that Stan first sets all parameters to values between -2 and 2, and you need to write model so that that guess is not too far from the true values. “Too far” in this case meaning maybe 20 standard deviations.

andrjohns · January 22, 2020, 12:58am

The infs are going to be difficult to avoid with the Phi function, since Phi will underflow to 0 below -37.5 and overflow to 1 above 8.

So for log1m(Phi(z)), if z is > 8, this evaluates to log(1 - 1) = log(0) = -inf.

For log(Phi(-z)), if z >= 37.5, this evaluates to log(Phi(-37.5)) = log(0) = -inf

martinmodrak · January 22, 2020, 1:52pm

That’s right. According to manual, Phi_approx should work better in the tails.

jeffpollock9 · January 22, 2020, 3:29pm

If at all useful, tensorflow_probability has special functions for log(Phi(x)) which have better numerical stability and might help here:

github.com

tensorflow/probability/blob/e0346910ecca45f6ee83a7deccb3fe7d690e296a/tensorflow_probability/python/internal/special_math.py#L290


  x = tf.where(sanitized_mcp > np.exp(-2.), x_for_big_p,
               tf.where(z >= 8.0, x_for_small_p, x_otherwise))


  x = tf.where(p > 1. - np.exp(-2.), x, -x)
  infinity_scalar = tf.constant(np.inf, dtype=p.dtype)
  x_nan_replaced = tf.where(p <= 0.0, -infinity_scalar,
                            tf.where(p >= 1.0, infinity_scalar, x))
  return x_nan_replaced




def log_ndtr(x, series_order=3, name="log_ndtr"):
  """Log Normal distribution function.


  For details of the Normal distribution function see `ndtr`.


  This function calculates `(log o ndtr)(x)` by either calling `log(ndtr(x))` or
  using an asymptotic series. Specifically:
  - For `x > upper_segment`, use the approximation `-ndtr(-x)` based on
    `log(1-x) ~= -x, x << 1`.
  - For `lower_segment < x <= upper_segment`, use the existing `ndtr` technique
    and take a log.

A very basic example of usage:

import tensorflow_probability as tfp

from tensorflow_probability.python.internal import special_math

dist = tfp.distributions.Normal(0.0, 1.0)

>>> dist.log_cdf(-100.0)
<tf.Tensor: shape=(), dtype=float32, numpy=-5005.524>
>>> special_math.log_ndtr(-100.0)
<tf.Tensor: shape=(), dtype=float32, numpy=-5005.524>
>>> dist.cdf(-100.0)
<tf.Tensor: shape=(), dtype=float32, numpy=0.0>

Max_Mantei · January 22, 2020, 4:14pm

I think the manual is a bit misleading here, because Phi_approx(z) will underflow to 0 at roughly z < -21.5 and overflow to 1 at roughly z > 5.

mcol · January 22, 2020, 5:05pm

If you have a suggestions on how to improve it, put it here and I’ll make a PR about it (unless you want to do it yourself directly, of course).

Max_Mantei · January 22, 2020, 5:48pm

I usually just stick to Phi() if I can’t avoid it. ;) The issue with Phi_approx() just came up here a few times and I guess the consensus was that Stan would profit from better (log) cdf implementations in general. I’m useless at anything C++, so I would probably only mess everything up… Haha. But, I guess the tfp implementation above would maybe be a good start, no?

mcol · January 22, 2020, 6:06pm

I meant to ask you for suggestions on how to improve the manual, but rereading what I wrote, it’s clear that I didn’t convey that… :)

Bob_Carpenter · January 23, 2020, 9:32pm

Thanks for the link. That’s some terse doc! I looked at the code and it’s a lot more complex than the doc indicates. But definitely something we can do if someone has interest. It’s an important feature.

R can also handle logs of normal cdfs to much more precision than we can.

I thought someone was working on this already in Stan, but maybe I’m confusing it with something else.

Max_Mantei · January 24, 2020, 2:09pm

Haha, okay, I misread that completely! Hm, I’m afraid I don’t have any specific suggestion for the manual.

Topic		Replies	Views
Integrate function General	2	329	August 22, 2023
Defining log likelihood in transformed parameters General	2	593	June 19, 2021
Generating random values from a user-defined distribution Modeling	6	547	July 29, 2019
Need help with evaluating the log probability at the initial value for drift diffusion model Modeling rstan , fitting-issues	2	364	January 10, 2024
Custom weibull function Modeling rstan , fitting-issues	1	313	July 5, 2021

Defining a custom lpdf for the Normal-Laplace convoluted density in Stan

Related topics