# How to specify likelihood when you don't have proper data points (but have bounds instead)

Say I have a set of data points that I think follow a lognormal distribution with parameters mu, sigma.
So far, so good. Only now, instead of actual data points (e.g., `x [1] = 3`, `x [2] = 10`), I only have lower and upper bounds for each data point (e.g., `x [1] > 1 & x [1] < 5`, `x [2] > 6 & x [2] < 20`). How can I specify that in Stan? The model below is almost but not quite correct:

``````data {
int <lower = 1> n;
int <lower = 0> lower_bound [n];
int <lower = 0> upper_bound [n];
}

parameters {
real mu;
real <lower = 0> sigma;
}

model {
for (i in 1: n) {
target += lognormal_lcdf (upper_bound [i] | mu, sigma);
target += lognormal_lccdf (lower_bound [i] | mu, sigma);
}
mu ~ normal (3, 0.5);
sigma ~ lognormal (log (0.5), 0.4);
}
``````

Itâ€™s not correct because:
`p (x > LB & x < UB) = p (x > LB) * p (x < UB | x > LB)`
Which is generally (and, in this case, definitely) not equal to `p (x > LB) * p (x < UB)`, which is what my model implies. If the â€śoriginalâ€ť distribution is a lognormal, I think `p (x < UB | x > LB)` is basically a truncated (and of course re-normalized) lognormal, right? Itâ€™d be truncated at LB. But I donâ€™t know how to specify that, and a search for â€śtruncatedâ€ť in the documentations generates results that are on truncated data (i.e., data that is reported only if itâ€™s within fixed bounds), which is not exactly what I have here.
Can anyone help? Thanks a lot!

Itâ€™s hard to say without knowing more details. Some questions:

• Why do you only have upper and lower bounds?
• Are these results from a measurement (and if so, what are you measuring)?
• Are these bounds somehow centered around some quantity?

• Why do you think your data follows a lognormal distribution?
• Why do you think the mean trend of your data is a constant rather than a function of some other variables?

``````model {
for (i in 1:n) {
target += weight [i] * log (lognormal_cdf (upper_bound [i], mu, sigma) - lognormal_cdf (lower_bound [i], mu, sigma));
}
mu ~ normal (3, 0.5);
sigma ~ lognormal (log (0.5), 0.4);
}
``````

By the way, my data is similar (in fact, structurally identical) to binned data, and I found out that someone had already thought of a solution very much identical to mine: https://www.reddit.com/r/rstats/comments/b05su0/estimating_continuous_distribution_params_from/

FYI this is more commonly called (interval) censoring and is covered in the Stan userâ€™s guide section 4.3. Also you may be interested to know that you can fit regression models with censored outcomes very easily in `brms` without writing your own Stan code using the `cens()` function.

1 Like