Hi there,
I have a model with a positive and discrete target variable Y \sim NegativeBinomial(\mu_1, \phi_1) and a latent variable L \sim Normal^{+}(\mu_2, \sigma_2). L is also positive and discrete, so I approximate it with a normal distribution as I cannot sample discrete parameters.
The Stan model is more complex but overall looks like this:
data {
int<lower=1> N; // number of observations
int<lower=0> Y; // target variable
matrix[M,N] X; // predictors
...
}
parameters {
vector<lower=0>[N] mu_1;
vector<lower=0>[N] mu_2;
vector<lower=0>[N] phi_1;
vector[M] theta;
...
}
transformed parameters {
vector<lower=0>[N] sigma_2;
vector<lower=0>[N] L;
...
mu_2 = f1(X, theta);
sigma_2 = f2(mu_2)
mu_1 = f3(L);
}
model {
...
L ~ normal(mu2, sqrt(mu2));
target += neg_binomial_2_lpmf(mu1, phi1);
}
There are three options how I can compute/sample L:
- Assume that L = \mu_2, i.e., no sampling, I just take the estimated mean.
- Sample L with \sigma_2 = \sqrt{\mu_2}, i.e., no over-dispersion (\rightarrow Poisson).
- Sample L with \sigma_2 = \sqrt{\mu_2 * (1 + \mu_2 / \phi_2)}, i.e., with over-dispersion (\rightarrow Negative Binomial).
I steadily extended the model to see what I can estimate and what might be “too much”. Yet, I encountered two issues that I found weird and cannot explain myself:
1. Issue: Option 3. is sampling well within an hour but option 2. is sampling forever so that I have so far always stopped sampling after an hour without progress. Originally, I would have thought that option 3. is sampling longer because it introduces an additional parameter that may not be feasible to estimate. Is that generally a false assumption and it depends on the whole data and model whether 2. or 3. is sampling faster?
2. Issue: I get quite different results for my parameters of interest \theta when doing option 1. instead of 3., so I wanted to make a numb check if implementation of option 3. is correct, although I am pretty sure it is. For that, I set \sigma_2 = 0.00001. Since \mu_2 can become quite large, this model should give the same results as the implementation of option 1… Yet, somehow, the model for \sigma_2 = 0.00001 samples forever (also had to stop it). Why is that? Does it have to do something with sampling of L being too constrained when \sigma_2 is this low?
I would be very happy if you could help me. Hopefully, I am overlooking something stupid.
Many thanks in advance.
Best,
Nic