I (a fairly new Stan user) am trying to generate parameter estimates for the following scenario:
- I have N observations and P predictors. Each predictor is binary, representing whether a certain activity was partaken in
- Each predictor has an associated, unknown risk \theta_j \in [0, 1] for j \in \{1, ..., P\}
- The event that an incident occurs during activity j for observation i is I_{i, j} \sim \text{Ber}(X_{i,j} \cdot \theta_j)
- The total number of incidents for observation i (which is not recorded) is Z_i = \sum_{j=1}^N I_{i, j}
- Finally, the variable we have available is Y_i = \mathbb{1}\{Z_i > 0\}
I’ve calculated the pmf of Y_i to be
p_{Y_i}(y) = \begin{cases} 1 - \prod_{j=1}^P (1-X_{i,j} \cdot \theta_j) & y =1 \\ \prod_{j=1}^P (1-X_{i,j} \cdot \theta_j) & y = 0 \end{cases}
This is where I start to get stuck. If my understanding is correct, I want to increment target
by the logarithm of this function. Taking the logarithm of the case y=0 is simple, but things get messy for y=1. Are there any tips or advice on how I could best implement this model in Stan? Moreover, as my desired dataset is quite large, an efficient solution would be appreciated.
For reference, my Stan code is
data {
int<lower=0> N; // number of observations
int<lower=0> P; // number of activities
matrix[N, P] X; // activity occurrences
vector<lower=0, upper=1>[N] y; // at least one incident
}
parameters {
vector<lower=0, upper=1>[P] theta; // activity risks
}
model {
theta ~ uniform(0, 1); // prior
// How to modify target?
}
As for data, I am currently prototyping the model, so I am using random X and \theta.
EDIT
I should point out that I am aware you can do these computations rather directly as
model {
real s;
real t;
theta ~ uniform(0, 1); // prior
for (i in 1:N) { // likelihood
s = 1 - 2 * y[i];
t = 1;
for (j in 1:P) {
t *= 1 - X[i, j] * theta[j];
}
t += y[i];
target += log(t);
}
}
I just worry that this approach will be too slow for large models and risk numerical precision errors.