Dependent variable: Can it be a derived quantity?

Doug · June 21, 2022, 2:47pm

I would like to do one of the following:

(events[i] / pop[i]) ~ lognormal(mu[i], sigma)

or

rate[i] ~ lognormal(mu[i], sigma)
rate[i] = events[i] / pop[i]

events is a count which is the sum of elements (some observed and some predicted) from another level of this multi-level model (not shown here). Because some of the elements going into events are model predictions, this is not just a data transformation.
pop is observed data on population size.
rate is an unobserved latent parameter quantifying the rate of events per person.
mu is a linear regression that is not shown.

Because neither of these model specifications are allowed (as far as I know), I have fallen back on:

pop[i] ~ poisson(events[i] / rate[i])
rate[i] ~ lognormal(mu[i], sigma)

but this has some undesirable characteristics.

Are there any coding tricks that could make either of the first two options possible?

More generally, is it statistically possible for the likelihood to be defined with a response variable that is a derived quantity (i.e. either unobserved, or at least imperfectly observed)? I suspect not, but I am looking for a good explanation of why not… or a strategy for how to do it!

Doug · June 21, 2022, 2:53pm

Cross-posted here: JAGS: Just Another Gibbs Sampler / Discussion / Open Discussion: Derived quantity as model response

andrjohns · June 22, 2022, 9:03am

It’s completely fine to specify priors for transformations of parameters & data, are you receiving some kind of error when you do so?

This specification should compile without issue:

(events[i] / pop[i]) ~ lognormal(mu[i], sigma)

The only situations where you need to be careful are when a non-linear transformation is applied, in which case you would need to add a jacobian adjustment.

For more information on this, see this forum post and this section of the Stan manual

EDIT:

If you want to specify a prior on (events[i] / pop[i]) and also use that quantity elsewhere in the model, you just need to construct it in the transformed parameters block:

transformed parameters {
  vector[N] rate = events ./ pop;
}

model {
  rate ~ lognormal(mu, sigma);
}

mike-lawrence · June 22, 2022, 10:37pm

And, to clarify, since the transform involves division by a quantity that is data, it’s fine. Division by a parameter would need a jacobian adjustment (ex. here).

jsocolar · June 23, 2022, 2:09am

When you write:

This means that log(events/pop) ~ normal(mu, sigma) which rearranges as follows:
log(events) - log(pop) ~ normal(mu, sigma)
log(events) ~ normal(mu + log(pop), sigma)
events ~ lognormal(mu + log(pop), sigma)

But your question still remains, since events itself is apparently a function of multiple parameters. See the excellent answers from @andrjohns and @mike-lawrence for how to deal with this. However, what confuses me is that you say

If events is a sum including other parameters, it cannot be a count, since all parameters in Stan are continuous. I suppose that you don’t really mean that events is a count?

Topic		Replies	Views
Priors on derived quantities of model Modeling	4	862	March 2, 2022
Specifying a model with sum of latent variables Modeling techniques , fitting-issues , specification	2	693	March 16, 2023
Question on unusual change of variables + prior specification Modeling	7	528	March 30, 2020
Target+= and multivariate likelihood Modeling	3	1139	March 13, 2019
Use derived quantity from one model as response variable in another Modeling rstan , techniques	8	439	July 29, 2022

Dependent variable: Can it be a derived quantity?

Related topics