Hello - this is my first time posting on the Stan Forums. I am trying to write a model that jointly (1) estimates the Poisson rate for some observed count data, (2) uses that estimate to calculate some unobserved counts, and (3) models those unobserved counts with respect to some predictor variable.
Here is a simplified version of what I am working with: I have some data on the number of seeds per flower (y_2) and the number of flowers per plant (y_1) (note: y_2 is only measured for a subset of plants and flowers). These data are intended to be used together to represent the total number of seeds per plant (Y). I would then like to model Y using some predictor variable (x). However, considering that Y is not directly observed and the number of seeds per flower can be variable, I was thinking I could treat Y as an unobserved/latent variable in my model.
So far, I’ve been able to write a model that is getting close to what I want, but not fully there:
data {
int<lower=0> n2; // number of observations for y2
array[n2] int<lower=0> y2; // number of seeds per flower
int<lower=0> n1; // number of observations for y1, x
array[n1] int<lower=0> y1; // number of flowers
vector[n1] x; // predictor
}
parameters {
real<lower=0> log_lambda; // log Poisson rate
real a; // intercept
real b; // slope
}
transformed parameters {
real<lower=0> lambda; // Poisson rate
lambda = exp(log_lambda);
}
model {
// model number of seeds per flower (y2)
log_lambda ~ normal(2, 1);
y2 ~ poisson(lambda);
// model number of flowers (y1)
// **I would like to model the total number of seeds per plant (Y) instead**
a ~ normal(0, 1);
b ~ normal(0, 1);
y1 ~ poisson(exp(a + b*x));
}
generated quantities {
array[n1] int Y; // total number of seeds per plant
for (i in 1:n1) {
Y[i] = poisson_rng(lambda)*y1[i];
}
}
I am estimating the Poisson rate \lambda for y_2 and the relationship between y_1 and x in the same model, then generating Y in the generated quantities
block. However, for various reasons, I would like to model the relationship between Y (not y_1) and x. To do this, I think I need to make Y a parameter (essentially missing data) and have something like Y ~ poisson(lambda)
in the model
block, but I also understand that Stan does not accept integer parameters.
I was hoping someone could help me figure out what could be done here. My main goal of trying to write the model like this is to carry forward uncertainty in Y, as opposed to deterministically calculating Y as mean(y2)*y1
(especially since I don’t think this uncertainty will be the same across observations), so I’m open to other approaches to handling the data/model that would allow me to do this. Thank you!