Hi there, I’m looking for advice about the best way to link two models that predict the same response together in Stan. Specifically, for a given location, I’m trying to simultaneously model the total biomass of a population, as well as the counts of the component members of the population by size category. I can model each separately successfully, but would like to combine them such that the information from the total model informs the count model if possible. I think there must be a way to do this via either the ODE solvers or the algebra solver, but I’ve no experience and need some friendly advice to get me going down the right path.
Here is a simplified example of the total model:
// total model
data {
int<lower=1> N;
int<lower=1> P;
real<lower=0> bio[N];
matrix[N, P] X;
}
parameters {
real kappa0; // intercept
vector[P] kappa; // biomass predictors
real<lower=0> inv_beta; // rate
}
transformed parameters {
real <lower=0> alpha[N];
real <lower=0> beta[N];
alpha = exp(kappa0 + X * kappa);
beta = 1 / inv_beta;
}
model {
target += normal_lpdf(kappa0 | 0, 0.5);
target += normal_lpdf(kappa | 0, 0.1);
target += normal_lpdf(inv_beta | 0, 0.5);
target += gamma_lpdf(bio | alpha, beta);
}
And here is a simplified example of the count Poisson hurdle model:
// count model
data {
int<lower=1> N;
int<lower=1> n_sizes;
int<lower=1, upper = n_sizes> size[N];
int<lower=0> y[N]; // count in each size
}
parameters {
real<lower=0, upper=1> theta[n_sizes];
real log_lambda[n_sizes];
}
model {
target += beta_lpdf(theta | 5, 1);
target += normal_lpdf(log_lambda | 0, 1);
for (n in 1:N) {
if (y[n] == 0) {
target += bernoulli_lpmf(1 | theta[size[n]]);
} else {
target += bernoulli_lpmf(0 | theta[size[n]]);
target += poisson_log_lpmf(y[n] | log_lambda[size[n]]);
}
}
Note that I’ve tried using the biomass predictors directly in the count model with some success, but for various reasons the predictors work better on the total rather than count by size class, however count by size class is really the true prediction problem, which is why I’d like to link the two.
So with the two models I have two different kinds of predictions for total biomass at a given location:
\textrm{biomass}_{total} \sim \textrm{Gamma}(\alpha, \beta)
\textrm{biomass}_{total} = \sum_{s=1}^S \textrm{size}_s * \textrm{count}_s
\textrm{count}_s \sim \textrm{Poisson Hurdle}(\theta_s, \lambda_s)
I guess what I’d really like to do is to condition the count model such that when integrated over sizes the totals that are within a certain tolerance of the the totals from the Gamma model. Any advice as to how to go about this would be appreciated.