Hello! I am wondering if anyone would be able to point me to examples (or books or tutorials) showing principled ways of coming up with estimates that combine the results of two or more models fit to different kinds of datasets. I have been constructing inferences using samples from the posteriors of two separately fit models – one that was fit to a dataset of individuals and the other fit to a dataset of aggregate counts – but I’m concerned about (e.g.) the errors of these models being correlated and I’m not sure how to account for that possibility, nor have I been able to find a good/complete example of doing so online.

Apologies if this question is too vague…

The specific data and model that are prompting my question:

I have person-level data on every lethal overdose reported to the coroner’s office of a major metropolitan area, including toxicology screens for each person. I’m currently trying to estimate (and then predict) the involvement of heroin in these lethal overdose cases. My primary goal is to model the percent of these overdoses that involve a positive tox screen for heroin, so I fit a binomial model to the data using McElreath’s map2stan interface. The exact date of death for each case, the age of the person, and the zip code where the death occurred (as a varying intercept) were the best predictors.

I also wanted to predict the *number* of overdoses involving heroin, so I aggregated all-cause overdose counts by year and zip code and fit a poisson regression to the overdose counts, with year and a varying intercept for zip code as the only predictors. It was a pretty good fit.

Here’s the part I feel sketchy about: To get heroin-related death count estimates for each year, I just drew ~20k positive heroin tox screen rate estimates from the posterior of the binomial model and multiplied them by ~20k count estimates from the poisson model for each zip and each year to yield a posterior distribution for number of heroin-related deaths at each year in each zip code.

The (city-level aggregate) results come out looking like this (shaded regions are 95%HPDI):

Are there more accurate ways of combining inferences from two different models, for example, methods that account for possible correlations between the two models? I’m not sure how to go about this, particularly given that the datasets I used to fit these models are different in nature (individual-level data versus counts of individuals). If anyone has an example they could point me to I’d be very grateful. It seems like my credible intervals should be wider than they are given that there are so many sources of uncertainty…

Thank you so much.