Defining a custom lpdf for the Normal-Laplace convoluted density in Stan

I am trying to define the Normal-Laplace density as stated here at the bottom of Page 5 https://arxiv.org/pdf/1712.07216.pdf but also supporting a location in the Normal component. I will attach my code here, but find the examples in the documentation hard to understand in this context so there may be some obvious issues with it:

functions {
   real mills_ratio(real z) {
      // return (1 - normal_cdf(z, 0, 1)) / exp(normal_lpdf(z | 0, 1)); 
      return (1 - Phi_approx(z)) / exp(std_normal_lpdf(z)); 
   }

   real normal_laplace_lpdf(vector y, real mu, real sigma, real scale) {
      real total = 0.0;
      for (i in 1:num_elements(y)) {
         total += log((1 / sqrt(2) * scale) * std_normal_lpdf((y[i] - mu) / sigma) * (mills_ratio((sqrt(2) * sigma) / scale - (y[i] - mu) / sigma) + mills_ratio((sqrt(2) * sigma) / scale + (y[i] - mu) / sigma)));
      }
      return total;
   }
}

I am looking to call it like this:

y2 ~ normal_laplace(mu, sqrt(sigma2), scale);

Currently this results in the following working with pystan:

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

y2 is an input of samples taken from a Normal(5,2) distribution with some Laplace(0,1.5) noise applied to them, scale is given as a known quantity to the stan model in the data section.

Any help would be much appreciated.

1 Like

Part of the problem is that you are taking the logarithm of the standard normal log-PDF rather than the logarithm of the standard normal PDF.

3 Likes

I think the problem is that you have std_normal_lpdf((y[i] - mu) / sigma) where it should be exp(std_normal_lpdf((y[i] - mu) / sigma)).

I recommend calculating everything on log scale, like this

functions {
   real log_mills_ratio(real z) {
      return log1m(Phi(z)) - std_normal_lpdf(z); 
   }

   real normal_laplace_lpdf(vector y, real mu, real sigma, real scale) {
      real total = 0.0;
      real k = (sqrt(2) * sigma) / scale;
      for (i in 1:num_elements(y)) {
         real r = (y[i] - mu) / sigma;
         total += -log(sqrt(2) * scale)
                + std_normal_lpdf(r)
                + log_sum_exp(log_mills_ratio(k - r),
                              log_mills_ratio(k + r));
      }
      return total;
   }
}
3 Likes

Thank you for the great help, I totally missed that exp, this does now work at least in part, my experiment involves varying the proportion of contaminated data and the laplace scale, i can now get through a few parameter configurations before seeing this:

…
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
multiprocessing.pool.RemoteTraceback:
“”"
Traceback (most recent call last):
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 121, in worker
result = (True, func(*args, **kwds))
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 44, in mapstar
return list(map(*args))
File “stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565.pyx”, line 373, in stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565._call_sampler_star
File “stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565.pyx”, line 406, in stanfit4anon_model_69327d427e3eac45965d5233a6417087_4342000952850253565._call_sampler
RuntimeError: Initialization failed.
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “run.py”, line 90, in
unseen_data, ytildes, klmodel, betamodel, knownmodel, args.cpu_count, noise_config[0], *prior_config)
File “/Users/Library/Mobile Documents/com~apple~CloudDocs/PhD/Synthetic Data/experiment.py”, line 41, in run_experiment
knownfit = knownmodel.sampling(data=data, warmup=warmup, iter=num_iter, chains=chains, n_jobs=cpu_count)
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/site-packages/pystan/model.py”, line 813, in sampling
ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/site-packages/pystan/model.py”, line 85, in _map_parallel
map_result = pool.map(function, args)
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File “/usr/local/Caskroom/miniconda/base/envs/bayes/lib/python3.7/multiprocessing/pool.py”, line 657, in get
raise self._value
RuntimeError: Initialization failed.

Do you have any further advice for troubleshooting this. I am currently using the code you suggested for my normal_laplace_lpdf

So the log probability is finite but it’s gradient is undefined? Ouch, that sounds rather tricky. My best guess is that the ratio calculation under flows. If so, this check will print a warning

real log_mills_ratio(real z) {
  real lr = log1m(Phi(z)) - std_normal_lpdf(z); 
  if (lr < -1e30) {
    print("Mills ratio failed! ", z, " -> ", lr);
  }
  return lr;
}
3 Likes

Yep I am getting a lot of -Infs here, do you have any ideas for how to fix? Some sample output:

Phi(z): 1
Std Norm lpdf: -177.215
Mills ratio failed! 18.7774 → -inf
Phi(z): 1
Std Norm lpdf: -198.119
Mills ratio failed! 19.8595 → -inf
Phi(z): 1
Std Norm lpdf: -598.526
Mills ratio failed! 34.5719 → -inf
Phi(z): 1
Std Norm lpdf: -343.762
Mills ratio failed! 26.1856 → -inf

These values don’t fill me with confidence.

log(Phi(-z)) should be more stable than (but otherwise equal to) log1m(Phi(z)).

The issue probably only hurts initialization. Once the sampler finds the correct posterior the errors should disappear.
Standardizing your data and parameters may help initialization. The idea is that Stan first sets all parameters to values between -2 and 2, and you need to write model so that that guess is not too far from the true values. “Too far” in this case meaning maybe 20 standard deviations.

2 Likes

The infs are going to be difficult to avoid with the Phi function, since Phi will underflow to 0 below -37.5 and overflow to 1 above 8.

So for log1m(Phi(z)), if z is > 8, this evaluates to log(1 - 1) = log(0) = -inf.

For log(Phi(-z)), if z >= 37.5, this evaluates to log(Phi(-37.5)) = log(0) = -inf

1 Like

That’s right. According to manual, Phi_approx should work better in the tails.

If at all useful, tensorflow_probability has special functions for log(Phi(x)) which have better numerical stability and might help here:

A very basic example of usage:

import tensorflow_probability as tfp

from tensorflow_probability.python.internal import special_math

dist = tfp.distributions.Normal(0.0, 1.0)

>>> dist.log_cdf(-100.0)
<tf.Tensor: shape=(), dtype=float32, numpy=-5005.524>
>>> special_math.log_ndtr(-100.0)
<tf.Tensor: shape=(), dtype=float32, numpy=-5005.524>
>>> dist.cdf(-100.0)
<tf.Tensor: shape=(), dtype=float32, numpy=0.0>
2 Likes

I think the manual is a bit misleading here, because Phi_approx(z) will underflow to 0 at roughly z < -21.5 and overflow to 1 at roughly z > 5.

1 Like

If you have a suggestions on how to improve it, put it here and I’ll make a PR about it (unless you want to do it yourself directly, of course).

I usually just stick to Phi() if I can’t avoid it. ;) The issue with Phi_approx() just came up here a few times and I guess the consensus was that Stan would profit from better (log) cdf implementations in general. I’m useless at anything C++, so I would probably only mess everything up… Haha. But, I guess the tfp implementation above would maybe be a good start, no?

I meant to ask you for suggestions on how to improve the manual, but rereading what I wrote, it’s clear that I didn’t convey that… :)

Thanks for the link. That’s some terse doc! I looked at the code and it’s a lot more complex than the doc indicates. But definitely something we can do if someone has interest. It’s an important feature.

R can also handle logs of normal cdfs to much more precision than we can.

I thought someone was working on this already in Stan, but maybe I’m confusing it with something else.

Haha, okay, I misread that completely! Hm, I’m afraid I don’t have any specific suggestion for the manual.