events is a count which is the sum of elements (some observed and some predicted) from another level of this multi-level model (not shown here). Because some of the elements going into events are model predictions, this is not just a data transformation.
pop is observed data on population size.
rate is an unobserved latent parameter quantifying the rate of events per person.
mu is a linear regression that is not shown.
Because neither of these model specifications are allowed (as far as I know), I have fallen back on:
Are there any coding tricks that could make either of the first two options possible?
More generally, is it statistically possible for the likelihood to be defined with a response variable that is a derived quantity (i.e. either unobserved, or at least imperfectly observed)? I suspect not, but I am looking for a good explanation of why not… or a strategy for how to do it!
It’s completely fine to specify priors for transformations of parameters & data, are you receiving some kind of error when you do so?
This specification should compile without issue:
(events[i] / pop[i]) ~ lognormal(mu[i], sigma)
The only situations where you need to be careful are when a non-linear transformation is applied, in which case you would need to add a jacobian adjustment.
If you want to specify a prior on (events[i] / pop[i]) and also use that quantity elsewhere in the model, you just need to construct it in the transformed parameters block:
And, to clarify, since the transform involves division by a quantity that is data, it’s fine. Division by a parameter would need a jacobian adjustment (ex. here).
This means that log(events/pop) ~ normal(mu, sigma) which rearranges as follows: log(events) - log(pop) ~ normal(mu, sigma) log(events) ~ normal(mu + log(pop), sigma) events ~ lognormal(mu + log(pop), sigma)
But your question still remains, since events itself is apparently a function of multiple parameters. See the excellent answers from @andrjohns and @mike-lawrence for how to deal with this. However, what confuses me is that you say
If events is a sum including other parameters, it cannot be a count, since all parameters in Stan are continuous. I suppose that you don’t really mean that events is a count?