Fitting a model to predictions from another unavailable model

Hi everyone, I have been thinking about some particular problem and I would appreciate some external advice.

Let’s say I want to model the spread of some particular phenomenon in a population. I have a good idea of the generative processes at play, and there are several measures of the consequences of this phenomenon in the population that I can fit the model to. The issue is that these measures are themselves produced from unavailable data using an unknown model that accounts for several covariates. And the uncertainty on these estimates is not available either.

So more formally, if the available data consists of point estimates of counts y_1, y_2 and y_3, considering that each data point has the same weight I can simply use as likelihood:

\Pr(y_1,y_2,y_3|\theta) = \prod_{i=1}^3 \Pr(y_i|\theta)

But if I do that, I ignore the uncertainty on y_1, y_2 and y_3 and therefore am probably overconfident in my inference. I thought that I could maybe use poisson_rng on the data before computing the likelihood, but you can’t use this function outside of the generated quantities block.

Do any of you have experience regarding this issue?

If the poisson assumption is good then the means give you the variances. You could sum over all plausible unobserved poisson values and add that to the log density rather than using the rng.

See the chapter in the Stan Manual on “Measurement Error and Meta-Analysis” for how to incorporate additional uncertainty on the observations.

For nominally Poisson data you could also expand the model to presume Negative Binomial data that is the outcome of convolving Poisson data with variation in the Poisson intensities, fitting the desperation parameter \phi along with everything else to learn the consequences of the additional uncertainty within the context of this specific assumption.

To sakrejda: Thanks, I see the idea, but would you mind giving me some hint on how to implement that?

To betanalpha: That’s a very good point, I didn’t think of the similarities with meta-analysis. I will try that, thanks!

In retrospect I misread your question so I no longer think the Poisson bit is relevant. You “point estimates of counts” and poisson_rng got me b/c I assumed you could think of these estimates as discrete but you can’t. Their uncertainty has nothing to do with the uncertainty of a Poisson, that mostly depends on how much data was used to estimate.