Inference using posterior samples from two very different models

Hello! I am wondering if anyone would be able to point me to examples (or books or tutorials) showing principled ways of coming up with estimates that combine the results of two or more models fit to different kinds of datasets. I have been constructing inferences using samples from the posteriors of two separately fit models – one that was fit to a dataset of individuals and the other fit to a dataset of aggregate counts – but I’m concerned about (e.g.) the errors of these models being correlated and I’m not sure how to account for that possibility, nor have I been able to find a good/complete example of doing so online.

Apologies if this question is too vague…

The specific data and model that are prompting my question:
I have person-level data on every lethal overdose reported to the coroner’s office of a major metropolitan area, including toxicology screens for each person. I’m currently trying to estimate (and then predict) the involvement of heroin in these lethal overdose cases. My primary goal is to model the percent of these overdoses that involve a positive tox screen for heroin, so I fit a binomial model to the data using McElreath’s map2stan interface. The exact date of death for each case, the age of the person, and the zip code where the death occurred (as a varying intercept) were the best predictors.

I also wanted to predict the number of overdoses involving heroin, so I aggregated all-cause overdose counts by year and zip code and fit a poisson regression to the overdose counts, with year and a varying intercept for zip code as the only predictors. It was a pretty good fit.

Here’s the part I feel sketchy about: To get heroin-related death count estimates for each year, I just drew ~20k positive heroin tox screen rate estimates from the posterior of the binomial model and multiplied them by ~20k count estimates from the poisson model for each zip and each year to yield a posterior distribution for number of heroin-related deaths at each year in each zip code.

The (city-level aggregate) results come out looking like this (shaded regions are 95%HPDI):

Are there more accurate ways of combining inferences from two different models, for example, methods that account for possible correlations between the two models? I’m not sure how to go about this, particularly given that the datasets I used to fit these models are different in nature (individual-level data versus counts of individuals). If anyone has an example they could point me to I’d be very grateful. It seems like my credible intervals should be wider than they are given that there are so many sources of uncertainty…

Thank you so much.

The correct statistical approach is to build a single model that encompasses both of those models at once. As with all models, however, exactly how you build that joint model will depend on the intimate details of your problem.

Here you have the difficulty of trying to combine the individual responses so that you can model the aggregated measurements. Because you didn’t measure every individual there will be some correlation between the individual parameters and the aggregate parameters, and the magnitude of that correlation will depend on how thoroughly you sampled the population for those individual measurements.

One potential way of modeling this correlation is by thinking hierarchically. Model the individual regressions with a hierarchical population model, and then use the population parameters to build up a model for the the aggregated measurements.

Hey Michael Betancourt,
Thank you so much for the response. I really like your idea, but I’m not clear on how to implement it and haven’t been able to find related examples / case studies of hierarchical population models like you seem to be describing. Currently I have two separate regressions of interrelated processes. My first model is for the probability that an individual person has heroin in their system when they lethally overdose. The stan code for that model looks like this:

data{
    int<lower=1> N;    // number of observations i.e. number of people who lethally overdosed in the city
    int<lower=1> N_Injury_Zip;   // number of zip codes people lethally overdosed in
    int heroin_tox_screen[N]; // binary outcome: did the person test positive for heroin? 
    real jul[N];  // z-scored julian date of death
    int Injury_Zip[N];  //index of zip codes where death occurred
    real age[N]; // the person's (z-scored) age at death
}
parameters{
    real a;   // intercept
    vector[N_Injury_Zip] a_ZIP;   // zip code-level varying intercept
    real bju;  // coefficient for julian date of death 
    real bju2; // coef. for julian date of death ^2
    real bju3; // coef. for julian date of death ^3 (higher-order polynomials don't improve fit w.r.t. WAIC)
    real ba;  // coef. for age

}
model{
    vector[N] A;  //intercept term
    vector[N] p;  //probability of testing positive for heroin
    bju3 ~ normal( 0 , 5 );  // weakly informative priors for all parameters
    bju2 ~ normal( 0 , 5 );
    bju ~ normal( 0 , 5 );
    a ~ normal( 0 , 5 );
    a_ZIP ~ normal( 0 , 5 );
    for ( i in 1:N ) {
        A[i] = a + a_ZIP[Injury_Zip[i]];  // intercept varies by zip code
    }
    for ( i in 1:N ) {
        p[i] = A[i] + bju * jul[i] + bju2 * jul[i]^2 + bju3 * jul[i]^3 + ba * age[i];
    }
    heroin_tox_screen ~ binomial_logit( 1 , p );  //likelihood of testing positive for heroin, given that you've lethally overdosed
}

The second model is for all-cause overdose counts. I constructed this aggregate count dataset from the individual-level dataset by calculating the number of lethal ODs in each zip code for each year. Note that there are many zip-code:year combinations where 0 overdoses occurred, which means they are represented in this aggregate dataset as 0s but do not appear at all in the individual-level dataset of lethal overdose cases. The stan code looks like this:

data{
    int<lower=1> N; // number of observations (i.e., all-cause ODs in each zip code each year)
    int<lower=1> N_zip; // number of zip codes
    int od_ct[N];  // outcome: observed number of all-cause ODs in the zip code during a given year
    real year[N];  // year (z-scored)
    int zip[N];  // index for the zip code where the overdoses occurred 
}
parameters{
    real a;  // intercept
    vector[N_zip] a_zip;  // zip-code level varying intercept
    real by;  // coefficient for year 
}
model{
    vector[N] A;  // intercept term
    vector[N] lambda; 
    by ~ normal( 0 , 10 );  // (very) weakly informative priors on all parameters
    a_zip ~ normal( 0 , 10 ); 
    a ~ normal( 0 , 10 );
    for ( i in 1:N ) {
        A[i] = a + a_zip[zip[i]]; // intercept varies by zip code
    }
    for ( i in 1:N ) {
        lambda[i] = A[i] + by * year[i];
    }
    od_ct ~ poisson_log( lambda );
}

At a bit of a loss for how I could approach this hierarchically!

Firstly, if you are not familiar with hierarchical models then you should take some time to read about them proper. Besides the Stan manual common references are http://www.stat.columbia.edu/~gelman/arm/ and http://www.stat.columbia.edu/~gelman/book/.

The general idea is that the slopes and intercepts for each individual come from a population distribution,

slope_indv ~ normal(mu_pop, tau_pop)

Then you could make mu_pop and/or tau_pop themselves depend on population-level information, or just come from a weakly-informative prior distribution.

These models become complex pretty quickly and their exact structure depends on the intimate details of your model and what you would like to do with it.

Thanks again for the response. I’ll keep reading and I guess try to get clearer about what I’m trying to do, including the right terminology. I’m not an expert by any means but I’m familiar with hierarchical models–the first model I posted here treats each zip code intercept as coming from a normal(0,5) prior distribution.
I guess I’m just trying to figure out if there’s a way to model two outcomes – a count and a binary category in this case – in a way that accounts for the likely relationship between those two outcomes. The count here is # of lethal overdoses and the binary category is whether a lethal overdose involves heroin. It seems likely to me that in years with higher lethal overdose counts, a larger proportion of those overdoses will involve heroin. Apologies for being obtuse/unclear about this. And thank you.

A normal(0, 5) prior does not constitute a hierarchical model. A hierarchical prior models the individual parameters as coming from a population distribution, say normal(mu, tau) whose variables, mu and tau, are themselves parameters in the model that are fit jointly with the individual level parameters.

As I noted above there is not an immediate model to apply here because your problem is ill-posed. Yes higher-proportion of heroin overdoses could imply a higher overall rate of overdoses in the population, but to incorporate that correlation you have to model it and there are many, many ways to model it. In order to decide which is most appropriate you have to use your domain specific expertise.

Also as I noted above there are multiple complications to this kind of model. Firstly the aggregated counts and individual counts come from different populations – at the very least the latter is a subset of the former. Secondly in order to model the aggregated data either you have to model everyone in the population so that you can apply the individual level model directly, or you have to approximate the aggregate behavior of the population.

To do the former you can apply techniques from multilevel regression and post stratification (MRP). Here you’d hierarchically model the probability of overdosing based on covariate information (age, sex, education status, previous drug use, etc) and then estimate the population response using covariate census data to generate hypothetical populations and apply the individual model one at a time. This requires having detailed census data but is extremely powerful.

To do the latter you can attempt what I had suggested previously. Use a hierarchical model to estimate the population distributions for various slopes and intercepts and then use those population parameters as input into a regression for the aggregate counts.

In either case you have to build a joint model where the population and the individuals are modeled coherently. When building these models it helps to think generatively: in the first case that means modeling individuals and then building up a population one by one, and in the latter case that means modeling the population first and then drawing individuals from that population. Both can be valid and useful.

Okay I hear you. Thank you very much for the detailed responses. This has been extremely helpful!