Latent variable in two likelihoods

joffdd · June 6, 2024, 2:54pm

I have two data sets describing two related perspectives on the same phaenomenon. The first one has daily counts of articles on a topic in a region. The second has ratios of articles counts divided by a total amount of regionally limited total news coverage (i.e. total article count). I use a hierarchical model to estimate the news coverage in W periods (survey waves). I use a poisson distribution for the former (count data) and a beta distribution for the latter (ratio data - alternatively, I have considered using an exponential distribution). I get the expected results when estimating them in separate STAN models.

(shows wave means of data set 1 (left) and 2 (right))

Since each data set uses slightly different methodology and works on the base of different data sources, yet contains valuable (& potentially competing) information, I would now like to combine these. I wand to estimate a latent variable fed by information from data set 1 and 2, ideally allowing me to quantify how the two pieces of information are balanced.

I have tried multiple parametrisations and approaches, but I always end up with data set 1 dominating the result, i.e., the latent variable is (almost) identical to a model in which I only use data set 1.

Eventually, I would like to use the latent variable as a parameter in a final regression likelihood that I would include as a third likelihood in the model.

As a STAN & bayesian statistics beginner, I realise I may be on the wrong path overall, but couldn’t get further than this. Would be happy about some hint where to go!

data {
  int<lower=0> N;   // number of article counts (data set 1)
  int<lower=0> M;   // number of article ratios (data set 2)
  int<lower=0> W;   // number of waves (time steps)
  array[N] int w_n; // mapping daily observations to wave
  array[M] int w_m; // mapping daily observations to wave
  array[N] int y_n; // outcome: article counts (data set 1)
  vector[M] y_m;    // outcome: article ratios (data set 2)
}
parameters {
  vector<lower=0>[W] alpha;  // latent variable: wave news coverage (data set 2)
  real<lower=0> rho;  // gamma scale (data set 2)
  real<lower=0,upper=1> delta; // transformation factor (data set 1)
}
model {
  delta ~ gamma(1, .001);
  alpha ~ gamma(1.5, rho);
  rho ~ gamma(1, .01);
  
  y_n ~ poisson(alpha[w_n] ./ delta);
  y_m ~ beta(1, 1/alpha[w_m]);
}

I think my general questions behind this are:

How is it possible to combine information coming from the same data generation process that is available in different shapes (i.e. distributions) - here: poisson vs beta
More generally, how would I “combine” two competing pieces of information? By using an additive model with weights, vs. simply multiplying their distributions, vs. the kind of latent variable approach to model the data generating process - and how can I determine the balance/“weights” of the mix of information
If my approach was not entirely wrong, is it possible that something related to the added log likelihood when the model gets evaluated can give one of the two likelihood statements less weight? If so, how would I influence this?

I would be very thankful to be sent to the right section in the literature, I didn’t quite know what to look for.

Bob_Carpenter · June 7, 2024, 10:26am

To answer your questions:

A. Yes. Writing two likelihoods for the same parameters is the right way to go if the two data streams are generated from the same underlying process (perhaps, for example, with different measurements and thus a different measurement error model).

B. In what sense are the two pieces of information competing? You want to take two data sets that both inform the same parameters, not that try to pull them in opposite directions. For example, you might measure someone’s blood pressure and also their heart rate and combine both pieces of information into some kind of measure of heart health using a single parameter and two likelihoods.

C. I suspect the problem you’re having may be due to the weakness of beta regression. You are probably getting a lot more information from the Poisson. If you build the two models separately, how much posterior uncertainty is there in each? That will give you a gauge of how much information both data sets are contributing.

A couple of other comments:

Why use an unbounded prior like a gamma(1, 0.001) over a parameter that’s constrained between (0, 1)? Stan will automatically do the truncation for you, so that’s OK. But the mean of gamma(1, 0.001) is 0.001—Is that what you intended? That can wind up being hard to fit because it’s pushing the parameter toward the boundary of its constraint.
I don’t have much experience with hierarchical gamma priors like you have for alpha. The prior on rho is also pushing that toward a value of 0.01.
You don’t need ./ to divide a vector by a scalar—plain / will work.
That beta(1, 1/x) prior winds up having a mean of 1 / (1 + 1/x). The expected value for alpha in the prior is going to be around 1.5 * .01, I think, which is going to put the prior mean of beta very close to 1. Is that the intent here?

joffdd · June 13, 2024, 11:37pm

Hi @Bob_Carpenter, many thanks for the very detailed comments. Sorry for getting back with delay.

I believe that I have indeed a situation where I have two measurements / measurement models with the same underlying process.

I said competing when trying to express that the two data sets are not equivalent even when projecting to the same scale, so I want to combine information from both ends and where in conflict, interpret that as samples from a noisy process and let the model capture higher uncertainty where relevant.

Indeed, it seems like the beta regression gives me more uncertain parameter information. I followed your advice and modelled the two data sets seperately, with the results confirming your hypothesis as far as I can see. The figures below show 90% CIs (whisker length) and densities of the Poisson/Beta parameters’ posterior distribution, roughly projected to the same scale (in line with the model code in the OP). The data sets have exactly the same number of observations per wave.

Also thanks for the other comments & hints, I considered them carefully, in some cases it was the intended behavior, in others I reparameterised the model. However, the weakness of beta regression phaenomena remained unchanged.

When assuming the Beta regression to be the root of the problem (and considering that I had very similar observations when using an Exponential distribution initially, and Gamma too):

What causes the weakness? Is that a general feature of the Beta distribution, or introduced by the way I parameterise it, or specific to the data fitted?
In case I had no other choice but parameterising the model that way, how would I, in view of the figures above, model a joint latent parameter that gives more weight to the Beta information such that it would contribute more, compensating for the regression weakness?

I sense that a gap in my theoretical knowledge prevent me from fully understanding what I get to observe here, I’d be thankful if you could give me the cue for some further reading. Same goes for modelling techniques that I’m likely not even aware of, making me being stuck. Thanks a lot!

(By the way, as a solution to just the case at hand, I found a workaround that could have been obvious to me all the time: multiplying the ratios with the assumed distribution location of the normalisation factor i.e. the assumed (mean of) total daily count, and passing their integer version to a Poisson too. Works in this special case, results below. I’m still interested in how I could have come there without dropping the Beta distr.)

results in:

simonbrauer · June 14, 2024, 1:37pm

Could you clarify why you use this particular parameterization of how the latent variable impacts the distribution of ratios? In the case of the Poisson data, the (scaled) latent variable maps to the mean (and variance) of the poisson distribution. In the beta case, it affects the spread of the distribution away from 0 (assuming all alphas are less than 1). I wonder if you will have more luck if the latent variable directly affects the mean of the beta distribution, with a separate scale parameter.

Per your observation that you could parameterize with two Poissons, maybe you could make some fake data, simulate under the different scenarios (latent variable that generates a Poisson and a Beta v.s. an LV that generates two Poissons that in turn are used to calculate a ratio), and see if you have any more luck with this particular model.

joffdd · June 15, 2024, 12:30pm

Thanks a lot, @simonbrauer, I think that was the crucial clue. I had indeed wrongly assumed that with a 1/alpha parameterisation of the Beta / Gamma / Exponential distribution, I have (something similar to) a mean moment that I can then estimate alongside the Poisson mean.

When using the following parameterisation using the suggested form in the manual, I get the expected results:

y_m ~ beta(alpha[w_m] * delta, (1-alpha[w_m]) * delta);
y_n ~ poisson(alpha[w_n] * delta);

Topic		Replies	Views
Combining data from multiple sources to model the same parameter Modeling	1	1029	November 29, 2017
How to model mixed variables Gaussian, Ranking binary and categorical in stan all together to generate a joint distribution of them? Modeling	2	451	October 14, 2023
Modelling a likelihood that changes based on auxiliary variables Modeling surveys , panel-data	2	658	January 29, 2021
Mixture of Normals as distribution for latent variable Modeling specification , mixture	4	554	April 29, 2021
Using two models to estimate the missing data in an independent variable General rstan	3	329	August 11, 2022

Latent variable in two likelihoods

Related topics