Hi, I am trying to build a model with a weighted likelihood function (observations in interval <-cut_off, cut_off> are less likely to be observed than other observations). However, I found that I’m having problems recovering true values even in simple scenarios. I broke the model to the most simple case and found that with the very same but negative observations the effective sample size is 100x lower than when the observations are positive.
f(x | \mu, \sigma, \beta) = \frac{N(x | \mu, \sigma)w(x, \beta)}{\int N(x | \mu, \sigma)w(x, \beta) \,dx}.
and if x > |cut_off|: w(x, \beta) = 1 else w(x, \beta) = \beta
I thought that my custom likelihood functions must be wrong, but I rewrote it into R and it seems to be ok - it integrates to 1 and has the same shape with positive and negative location parameter. So, some numerical problems when computing the probability of the weighted middle part of the distribution?
functions{
// weighted likelihood
real log_lik_function(real x, real mu, real sigma, real beta, real cut_off){
real log_lik;
real nominator;
real denominator;
real w;
real denom_part[3];
// change weight for current observation
if(fabs(x) < cut_off){
w = beta;
}else{
w = 1;
}
nominator = normal_lpdf(x | mu, sigma)+log(w);
// the lower unweighted tail
denom_part[1] = normal_lcdf(-cut_off | mu, sigma);
// the weighted middle
denom_part[2] = log(normal_cdf(cut_off, mu, sigma) - normal_cdf(-cut_off, mu, sigma))+log(beta);
// the upper unweighted tail
denom_part[3] = normal_lccdf(cut_off | mu, sigma);
denominator = log_sum_exp(denom_part);
log_lik = nominator-denominator;
return log_lik;
}
}
data {
int<lower=0> K;
vector[K] x;
real cut_off;
}
parameters{
real mu;
real<lower=0> sigma;
real<lower=0,upper=1> beta;
}
model{
// priors
target += normal_lpdf(mu | 0, 10);
target += normal_lpdf(sigma | 0, 10);
target += beta_lpdf(beta | 1, 1);
for(k in 1:K){
target += log_lik_function(x[k], mu, sigma, beta, cut_off);
}
}
The problems occur only if the location of the true mean parameter is higher than cca 8. The fit is fitted perfectly with the positive mean with well-mixed chains and effective sample size > 2500, while the fit_neg has many divergences, large R-hat and effective sample size < 50. But the dataset is completely the same, but instead of x values I fitt the model to -x.
(The \beta = 1 is equal to 1 for illustration of the problem)
m <- rstan::stan_model("test.stan")
dat <- list(
x = rnorm(1000, 10, 1),
K = 1000,
cut_off = 2
)
dat_neg <- dat
dat_neg$x <- -dat$x
fit <- rstan::sampling(m, data = dat, iter = 2000, warmup = 1000, chains = 3, cores = 1,
control = list(adapt_delta = .90))
fit_neg<- rstan::sampling(m, data = dat_neg, iter = 2000, warmup = 1000, chains = 3, cores = 1,
control = list(adapt_delta = .90))
summary(fit)$summary
summary(fit_neg)$summary
The R implementation of the likelihood function in order to check that it behaves correctly:
lik_function <- function(x, mu, sigma, beta, cut_off){
denom_part <- rep(NA, 3)
if(abs(x) < cut_off){
w = beta
}else{
w = 1
}
nominator = dnorm(x, mu, sigma, T)+log(w)
denom_part[1] = pnorm(-cut_off, mu, sigma, T, T)
denom_part[2] = log(pnorm(cut_off, mu, sigma) - pnorm(-cut_off, mu, sigma))+log(beta)
denom_part[3] = pnorm(cut_off, mu, sigma, F, T)
denominator = log(sum(exp(denom_part)))
log_lik = nominator-denominator
return(exp(log_lik))
}
x_seq <- seq(-20, 20, .01)
plot(x_seq, sapply(x_seq, function(x)lik_function(x, 10, 1, 1, 2)), type = "l")
lines(x_seq, sapply(x_seq, function(x)lik_function(x,-10, 1, 1, 2)))
integrate(Vectorize(lik_function), lower = -Inf, upper = Inf, mu = 10, sigma = 1, beta = 1, cut_off = 2)
integrate(Vectorize(lik_function), lower = -Inf, upper = Inf, mu = -10, sigma = 1, beta = 1, cut_off = 2)
In this specific case I could probably solve the problem by scaling the values and the cut_offs to smaller numbers, but that wouldn’t work for the more complex model I wanna apply.
I also tried changing the way the probability of the weighted middle part of the distribution dependent in the mu, but it doesn’t affect the results.
if(mu > 0){
denom_part[2] = log(exp(normal_lcdf(cut_off | mu, sigma)) - exp(normal_lcdf(-cut_off | mu, sigma)))+log(beta);
}else{
denom_part[2] = log(exp(normal_lccdf(-cut_off | mu, sigma)) - exp(normal_lccdf(cut_off | mu, sigma)))+log(beta);
}
And more surprising, even abusing the symetry around 0 did not help.
if(mu > 0){
denom_part[2] = log(normal_cdf(cut_off, mu, sigma) - normal_cdf(-cut_off, mu, sigma))+log(beta);
}else{
denom_part[2] = log(normal_cdf(cut_off,-mu, sigma) - normal_cdf(-cut_off,-mu, sigma))+log(beta);
}