Poisson and its expected value data fusion?

How to fusion two data sources?
One sensor is follows a gaussian distribution and the other sensor values is count data. Both sensors share the same true latent source which is unknown, and which we want to estimate.

I tried the following approach in Stan.

data {
  int<lower=1> N;
  vector[N] Yg;
  int<lower=0> Yp[N];
}

parameters {
  real mu;
  real<lower=0> sigma_gauss;
}

model {
  mu ~ normal(0, 5);
  sigma_gauss ~ std_normal();
  Yg ~ normal(mu, sigma_gauss);
  Yp ~ poisson_log(mu);
}

I’m unclear what would be a good solution in Stan?
We know the variance of Poisson it’s exp(mu) also we have the standard deviation of the gaussian source sigma_gauss, shouldn’t we use?


   real sensor_fusion(real signal1, real sigma_signal1, real signal2
                           , real sigma_signal2) {
    return (signal1 * sigma_signal2 + signal2 * sigma_signal1) / (sigma_signal1 + sigma_signal2);
  }

Data generator process is

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = 4)

mu <- log(1)
sigma <- 0.7
N <- 10
Ngauss <- N
Npois <- N
Yg <- rnorm(Ngauss, mu, sigma)
Yp <- rpois(Npois, exp(mu))


sdat <- list(
    N = N
  , Yg = Yg
  , Yp = Yp
)

smod <- stan_model("twosources_poisnormal.stan")
lm_waic <- sampling(smod, data = sdat,
                    iter=10000,
                    chains=1,
                    seed = 123
                    , control = list(
                      adapt_delta = 0.8,        #default=0.8
                      max_treedepth = 10
                    ))
 extr<-extract(lm_waic)

Or a mixture model?

Any ideas?

Hi,
I am not an expert on this, but your proposed solution seems OK - it definitely matches your simulated data. Are you asking about general considerations or are you having a problem with fitting the model?

It looks like you are already using those in the model - what exactly do you have in mind?

Mixture model would probably not be appropriate here - mixture would only make sense if you had one measurement for each data point and didn’t know from which of the two sources it originates.

Thanks Martin. I have a medical data set with missing data. I’d to impute the missing data.
Before I do so, its good to gain knowledge what works best in a well-known set of simulated data.
So lets start with normal - count data, then normal - binomial data and then use that knowledge to
build “unified” model.
I adjusted that model that I put an random intercept to each count observation.

The question is not that it is not working, with all kalman filter theory there must be a best practice method. I wonder what is a good solution in Stan?

I believe that for the simple case you have described in the original question, your solution is OK (you could do Simulation-Based Calibration, if you want to be very certain that your model works, but that requires non-trivial additional effort).

I don’t really understand Kalman filters (only the stuff I remember from school + a quick glance on Wikipedia), but if I get it correctly, then Kalman filter is an efficient computational method to compute/approximate (for non-Gaussian erros) a specific form of posterior distribution. If you encode the same distribution in Stan, you should get an accurate answer, but with more computational resources needed. For Gaussian errors the answers of Kalman filter and Stan should coincide. So I am not sure how that binds to your original question.

In any case, if you have further questions Kalman filters or about different models than the one that started this thread, you probably should start a new thread where people more knowledgeable about the topic than me can chime in.

I cannot speak for the whole community, but I think the general vibe here is that best practice is to understand your problem deeply, develop solutions specific to your problem and don’t rely too much on what worked for other people, as your data might easily by quite different. Although it is possible, I wouldn’t bet there are best practices for your particular problem, but there are best practices for model development, as outlined for example in Mike Betancourt’s “Towards A Principled Bayesian Workflow” and Robust Statistical Workflow with RStan. The visualisation paper is also helpful.

With that in mind, a good place to ask about best practice in the clinic is IMHO the Data Methods forums: https://discourse.datamethods.org/ as there are way more clinical statistician contributors than here.

1 Like