Dependent variable: Can it be a derived quantity?

I would like to do one of the following:

(events[i] / pop[i]) ~ lognormal(mu[i], sigma)

or

rate[i] ~ lognormal(mu[i], sigma)
rate[i] = events[i] / pop[i]
  • events is a count which is the sum of elements (some observed and some predicted) from another level of this multi-level model (not shown here). Because some of the elements going into events are model predictions, this is not just a data transformation.
  • pop is observed data on population size.
  • rate is an unobserved latent parameter quantifying the rate of events per person.
  • mu is a linear regression that is not shown.

Because neither of these model specifications are allowed (as far as I know), I have fallen back on:

pop[i] ~ poisson(events[i] / rate[i])
rate[i] ~ lognormal(mu[i], sigma)

but this has some undesirable characteristics.

Are there any coding tricks that could make either of the first two options possible?

More generally, is it statistically possible for the likelihood to be defined with a response variable that is a derived quantity (i.e. either unobserved, or at least imperfectly observed)? I suspect not, but I am looking for a good explanation of why not… or a strategy for how to do it!

1 Like

Cross-posted here: JAGS: Just Another Gibbs Sampler / Discussion / Open Discussion: Derived quantity as model response

It’s completely fine to specify priors for transformations of parameters & data, are you receiving some kind of error when you do so?

This specification should compile without issue:

(events[i] / pop[i]) ~ lognormal(mu[i], sigma)

The only situations where you need to be careful are when a non-linear transformation is applied, in which case you would need to add a jacobian adjustment.

For more information on this, see this forum post and this section of the Stan manual

EDIT:

If you want to specify a prior on (events[i] / pop[i]) and also use that quantity elsewhere in the model, you just need to construct it in the transformed parameters block:

transformed parameters {
  vector[N] rate = events ./ pop;
}

model {
  rate ~ lognormal(mu, sigma);
}
3 Likes

And, to clarify, since the transform involves division by a quantity that is data, it’s fine. Division by a parameter would need a jacobian adjustment (ex. here).

1 Like

When you write:

This means that log(events/pop) ~ normal(mu, sigma) which rearranges as follows:
log(events) - log(pop) ~ normal(mu, sigma)
log(events) ~ normal(mu + log(pop), sigma)
events ~ lognormal(mu + log(pop), sigma)

But your question still remains, since events itself is apparently a function of multiple parameters. See the excellent answers from @andrjohns and @mike-lawrence for how to deal with this. However, what confuses me is that you say

If events is a sum including other parameters, it cannot be a count, since all parameters in Stan are continuous. I suppose that you don’t really mean that events is a count?

1 Like