Say I have a set of data points that I think follow a lognormal distribution with parameters mu, sigma.
So far, so good. Only now, instead of actual data points (e.g., x [1] = 3
, x [2] = 10
), I only have lower and upper bounds for each data point (e.g., x [1] > 1 & x [1] < 5
, x [2] > 6 & x [2] < 20
). How can I specify that in Stan? The model below is almost but not quite correct:
data {
int <lower = 1> n;
int <lower = 0> lower_bound [n];
int <lower = 0> upper_bound [n];
}
parameters {
real mu;
real <lower = 0> sigma;
}
model {
for (i in 1: n) {
target += lognormal_lcdf (upper_bound [i] | mu, sigma);
target += lognormal_lccdf (lower_bound [i] | mu, sigma);
}
mu ~ normal (3, 0.5);
sigma ~ lognormal (log (0.5), 0.4);
}
It’s not correct because:
p (x > LB & x < UB) = p (x > LB) * p (x < UB | x > LB)
Which is generally (and, in this case, definitely) not equal to p (x > LB) * p (x < UB)
, which is what my model implies. If the “original” distribution is a lognormal, I think p (x < UB | x > LB)
is basically a truncated (and of course re-normalized) lognormal, right? It’d be truncated at LB. But I don’t know how to specify that, and a search for “truncated” in the documentations generates results that are on truncated data (i.e., data that is reported only if it’s within fixed bounds), which is not exactly what I have here.
Can anyone help? Thanks a lot!