In my data, subject i is from one of two distributions.
With probability p_normal[i], a likelihood of subject i’s value Y_obs[i]
is calculated from normal distribution (expected value is from regression Y = constant + coeff*covariate and standard deviation is fitted).
With probability 1-p_normal[i], a likelihood is calculated from geometric distribution (most of these individual have Y_obs = 0, and p for geometric distribution is fitted).
p_normal for each subject is known.
I simulated latent Y (Y_latent) with parameters below, the histogram is as below.
n_subj ← 200
n_obs ← 200 (now one data per subject)
n_cov ← 4 (number of covariates)
coeff ← c(1.2, 1.5, -0.9, 0.8) (coefficients)
constant = 4
And then to simulate observed Y, errors are applied (sd for normal distribution = 0.4, p for geometric distribution = 0.97)
and the values are discretized using floor(Y_latent) and truncated at 0,
so the histogram of Y_obs is as below.
To calculate likelihood from discretized normal distribution,
I use log_diff_exp(normal_lcdf(Y_obs + 1 | Y_expected, sigma), normal_lcdf(Y_obs | Y_expected, sigma)).
Also to extend (y-mu)/sigma from 8.25 to 37.5,
I use log1m(normal_cdf(-Y_obs, -Y_expected, sigma)) when (Y_obs - Y_expected) is positive.
functions {
vector geometric_lp(int n, int x, real p) {
vector[n] lp;
for(i in 1:n) {
lp[i] = x[i] * log1m(p) + log(p);
}
return lp;
}
real log_normal_truncated(int Y_obs, real Y_reg, real sigma){
real lp;
if (Y_obs - Y_reg > 0){
lp = log_diff_exp(0, normal_lcdf( -Y_obs | -Y_reg, sigma));
}
else {
lp = normal_lcdf( Y_obs | Y_reg, sigma);
}
return (lp);
}
real log_normal_discretized(int Y_obs, real Y_reg, real sigma){
real lp;
if (Y_obs - Y_reg > 1){
lp = log_diff_exp(normal_lcdf( -Y_obs | -1.0*Y_reg, sigma),
normal_lcdf( -Y_obs - 1 | -1.0*Y_reg, sigma));
}
else {
lp = log_diff_exp(normal_lcdf( Y_obs + 1 | Y_reg, sigma),
normal_lcdf( Y_obs | Y_reg, sigma));
}
return (lp);
}
vector discretized_normal_lp(int n, int Y_obs, vector Y_reg, real sigma) {
vector[n] lp;
for(i in 1:n) {
if(Y_obs[i] == 0){
lp[i] = log_normal_truncated(1, Y_reg[i], sigma);
}else{
lp[i] = log_normal_discretized(Y_obs[i], Y_reg[i], sigma);
}
}
return lp;
}
}
data {
int n_subj;
int n_obs;
int subj_obs[n_obs]; //indices to map subjects to observation
int Y_obs[n_obs]; //observations
int subj_start_ind[n_subj]; //first indices of subjects
int subj_end_ind[n_subj]; //last indices of subjects
int n_cov; //number of covariates
matrix[n_subj, n_cov] cov; //covariates
vector[n_subj] p_normal; //probability that a subject is from normal distribution
}
parameters {
vector[n_cov] coeff;
real intercept;
real<lower=0> sigma;
real<lower=0.8, upper=1> p_geo;
}
transformed parameters {
vector[n_obs] Y_reg = rep_vector(intercept, n_obs) +
cov[subj_obs] * coeff;
}
model {
vector[n_obs] lp_zero;
vector[n_obs] lp_reg;
intercept ~ normal(0, 5);
coeff ~ normal(0, 5);
sigma ~ normal(0, 5);
p_geo ~ uniform(0.8, 1.0);
lp_zero = geometric_lp(n_obs, Y_obs, p_geo);
lp_reg = discretized_normal_lp(n_obs, Y_obs, Y_reg, sigma);
for(i in 1:n_subj) {
target += log_mix(
p_normal[i],
sum(lp_reg[subj_start_ind[i]:subj_end_ind[i]]),
sum(lp_zero[subj_start_ind[i]:subj_end_ind[i]])
);
}
}
After warm-up, I get 87 divergent transitions. These are false positive due to limitation of normal_lcdf.
I checked many times by varying the range of (y-mu)/sigma.
Even though my actual data rarely gets divergent transitions (because it has larger standard deviation),
I was going to implement normal_lcdf of my own using boost’s erf for stability.
And I found that stan is already using it.
First, I want to know why stan developers had to set ranges between -37.5 and 8.25.
Also I want to hear more suggestions (Any suggestions besides increasing adapt-delta).