Sampling problems with only larger negative observations (custom even likelihood)

Frantisek_Bartos · January 13, 2020, 4:09pm

Hi, I am trying to build a model with a weighted likelihood function (observations in interval <-cut_off, cut_off> are less likely to be observed than other observations). However, I found that I’m having problems recovering true values even in simple scenarios. I broke the model to the most simple case and found that with the very same but negative observations the effective sample size is 100x lower than when the observations are positive.

f(x | \mu, \sigma, \beta) = \frac{N(x | \mu, \sigma)w(x, \beta)}{\int N(x | \mu, \sigma)w(x, \beta) \,dx}.

and if x > |cut_off|: w(x, \beta) = 1 else w(x, \beta) = \beta

I thought that my custom likelihood functions must be wrong, but I rewrote it into R and it seems to be ok - it integrates to 1 and has the same shape with positive and negative location parameter. So, some numerical problems when computing the probability of the weighted middle part of the distribution?

functions{
  // weighted likelihood
  real log_lik_function(real x, real mu, real sigma, real beta, real cut_off){
     
    real log_lik;
    real nominator;
    real denominator;
    real w;
    real denom_part[3];
    
    // change weight for current observation
    if(fabs(x) < cut_off){
      w = beta;
    }else{
      w = 1;
    }

    nominator   = normal_lpdf(x | mu, sigma)+log(w);
    
    // the lower unweighted tail
    denom_part[1] = normal_lcdf(-cut_off | mu, sigma);
    // the weighted middle
    denom_part[2] = log(normal_cdf(cut_off, mu, sigma) - normal_cdf(-cut_off, mu, sigma))+log(beta);
    // the upper unweighted tail
    denom_part[3] = normal_lccdf(cut_off | mu, sigma); 
    denominator   = log_sum_exp(denom_part);

    log_lik = nominator-denominator;
 
    return log_lik;
  }
}
data {
  int<lower=0> K;
  vector[K] x;
  real cut_off;
}
parameters{
  real mu;
  real<lower=0> sigma;
  real<lower=0,upper=1> beta;
}
model{
  // priors
  target += normal_lpdf(mu    | 0, 10);
  target += normal_lpdf(sigma | 0, 10);
  target += beta_lpdf(beta | 1, 1);
  
  for(k in 1:K){
    target += log_lik_function(x[k], mu, sigma, beta, cut_off);
  }
}

The problems occur only if the location of the true mean parameter is higher than cca 8. The fit is fitted perfectly with the positive mean with well-mixed chains and effective sample size > 2500, while the fit_neg has many divergences, large R-hat and effective sample size < 50. But the dataset is completely the same, but instead of x values I fitt the model to -x.

(The \beta = 1 is equal to 1 for illustration of the problem)

m <- rstan::stan_model("test.stan")
dat <- list(
  x = rnorm(1000, 10, 1),
  K = 1000,
  cut_off = 2
)
dat_neg   <- dat
dat_neg$x <- -dat$x
fit    <- rstan::sampling(m, data = dat, iter = 2000, warmup = 1000, chains = 3, cores = 1,
                          control = list(adapt_delta = .90))
fit_neg<- rstan::sampling(m, data = dat_neg, iter = 2000, warmup = 1000, chains = 3, cores = 1,
                          control = list(adapt_delta = .90))

summary(fit)$summary
summary(fit_neg)$summary

The R implementation of the likelihood function in order to check that it behaves correctly:

lik_function <- function(x, mu, sigma, beta, cut_off){

  denom_part <- rep(NA, 3)
  
  if(abs(x) < cut_off){
    w = beta
  }else{
    w = 1
  }
  nominator   = dnorm(x, mu, sigma, T)+log(w)
  
  denom_part[1] = pnorm(-cut_off, mu, sigma, T, T)
  denom_part[2] = log(pnorm(cut_off, mu, sigma) - pnorm(-cut_off, mu, sigma))+log(beta)
  denom_part[3] = pnorm(cut_off, mu, sigma, F, T)
  denominator   = log(sum(exp(denom_part)))
  
  log_lik = nominator-denominator
  
  return(exp(log_lik))
}
x_seq <- seq(-20, 20, .01)
plot(x_seq, sapply(x_seq, function(x)lik_function(x, 10, 1, 1, 2)), type = "l")
lines(x_seq, sapply(x_seq, function(x)lik_function(x,-10, 1, 1, 2)))
integrate(Vectorize(lik_function), lower = -Inf, upper = Inf, mu =  10, sigma = 1, beta = 1, cut_off = 2)
integrate(Vectorize(lik_function), lower = -Inf, upper = Inf, mu = -10, sigma = 1, beta = 1, cut_off = 2)

In this specific case I could probably solve the problem by scaling the values and the cut_offs to smaller numbers, but that wouldn’t work for the more complex model I wanna apply.

I also tried changing the way the probability of the weighted middle part of the distribution dependent in the mu, but it doesn’t affect the results.

    if(mu > 0){
      denom_part[2] = log(exp(normal_lcdf(cut_off | mu, sigma)) - exp(normal_lcdf(-cut_off | mu, sigma)))+log(beta);
    }else{
      denom_part[2] = log(exp(normal_lccdf(-cut_off | mu, sigma)) - exp(normal_lccdf(cut_off | mu, sigma)))+log(beta);
    }

And more surprising, even abusing the symetry around 0 did not help.

    if(mu > 0){
      denom_part[2] = log(normal_cdf(cut_off, mu, sigma) - normal_cdf(-cut_off, mu, sigma))+log(beta);
    }else{
      denom_part[2] = log(normal_cdf(cut_off,-mu, sigma) - normal_cdf(-cut_off,-mu, sigma))+log(beta);
    }

martinmodrak · January 17, 2020, 2:33pm

Hi,
I don’t see the problem directly, but one suspicious thing is that x = rnorm(1000, 10, 1) will very rarely produce values within the cutoff bounds. Also this means that cut_off will be far away from any mu that is supported by data and it is possible the computation becomes numerically unstable.

What happens when you run with x values that actually intersect the cutoff region?

Best of luck with your model!

Frantisek_Bartos · January 17, 2020, 9:47pm

Hi Martin,

thank you for taking a look.

Yes, the data in the problematic scenarios are indeed outside of the weighted range (which can happen in the more complex model quite easily). If I ran it with values generated in a way that they are closer to the cut_off, the problems disappear.

Nevertheless, I was playing around with it for some more time and found out that switching all of the density computation in dependence on mu actually removes the issues. Interestingly, the problem was not only the weighted part - switching the nominator sped up the fitting process, but only when I switched all parts of the denominator, the sampling was efficient as if the mu was positive.

I have no idea why a thing like this should work - ie. why the problem emerges only when computing the likelihood for the large negative values, but large positive values are OK.

    if(mu > 0){
      nominator   = normal_lpdf(x | mu, sigma)+log(w);
    }else{
      nominator   = normal_lpdf(-x | -mu, sigma)+log(w);
    }
      
    if(mu > 0){
      // the lower unweighted tail
      denom_part[1] = normal_lcdf(-cut_off | mu, sigma);
      // the weighted middle
      denom_part[2] = log(exp(normal_lcdf(cut_off | mu, sigma)) - exp(normal_lcdf(-cut_off | mu, sigma)))+log(beta);
      // the upper unweighted tail
      denom_part[3] = normal_lccdf(cut_off | mu, sigma); 
    }else{
      // the lower unweighted tail
      denom_part[1] = normal_lccdf(cut_off | -mu, sigma);
      // the weighted middle
      denom_part[2] = log(exp(normal_lccdf(-cut_off | -mu, sigma)) - exp(normal_lccdf(cut_off | -mu, sigma)))+log(beta);
      // the upper unweighted tail
      denom_part[3] = normal_lcdf(-cut_off | -mu, sigma);
    }

martinmodrak · January 18, 2020, 12:47pm

I think (have not checked) it could all boil down to floating point being messy - your example is in the very extreme tail of the normal distribution (12 \sigma to the more distant cutoff - that’s a lot even by high-energy physics standards :-) - the probability of having a value such low is < 10^{-32} ), so when mu > 0 , normal_lcdf(-cut_off | mu, sigma) and normal_lccdf(cut_off | mu, sigma) are likely some quite large negative number. But when mu < 0 those become a number close to 0 and it is possible you loose precision or the particular Stan implementation has issues in this range (or both).

That is IMHO a strong indiciation there is something fundamentally wrong with your model. If the data almost never intersect the cutoff range, why model the cutoff at all?

Anyway - I am glad you were able to resolve the issue.

martinmodrak · January 18, 2020, 1:05pm

Sorry, I messed up a bit, so just to correct, when mu = 10, and sigma = 1 and cutoff = 2, the normalized distances to cutoff are -12 and -8 respectively.

normal_lcdf(-cut_off | mu, sigma) will be negative.
normal_lccdf(cut_off | mu, sigma) will be close to zero

The manual https://mc-stan.org/docs/2_21/functions-reference/normal-distribution.html says that:
normal_lcdf will underflow to −∞ for (normalized distance) below -37.5 and overflow to 0 for (normalized distance) above 8.25; while normal_lccdf is complementary and will overflow to 0 for (normalized distance) below -37.5 and underflow to −∞ for (normalized distance) above 8.25.

So when mu = 10 you are well within those bounds, but when mu = -10 the normalized distances become 8 and 12 and you are out of bounds. The manual also says that using log(Phi_approx(...)) should work better in the very tails.

Frantisek_Bartos · January 18, 2020, 1:20pm

Thank you very much! This completely addresses my issue.

I was originally solving the issue for t-distribution (the manual doesn’t have this note in its section, and it did not occur to me to check the manual for the normal one).

Topic		Replies	Views
Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity Modeling	5	179	July 15, 2024
Optimizing() and sampling() over disjoint domain/ excluded point Modeling	16	1326	October 22, 2017
Strange behavior with lognormal() sampling statement Modeling fitting-issues	15	815	May 20, 2020
Estimate distribution with missing values and weight function brms	7	300	February 26, 2024
Survey weighted regression Modeling	34	9011	May 27, 2022

Sampling problems with only larger negative observations (custom even likelihood)

Related topics