Long sampling for normal parameter with low sigma or no over-dispersion

Hi there,

I have a model with a positive and discrete target variable Y \sim NegativeBinomial(\mu_1, \phi_1) and a latent variable L \sim Normal^{+}(\mu_2, \sigma_2). L is also positive and discrete, so I approximate it with a normal distribution as I cannot sample discrete parameters.

The Stan model is more complex but overall looks like this:

data {
int<lower=1> N; // number of observations
int<lower=0> Y; // target variable
matrix[M,N] X; // predictors
...
}

parameters {
vector<lower=0>[N] mu_1; 
vector<lower=0>[N] mu_2;
vector<lower=0>[N] phi_1;
vector[M] theta;
...
}

transformed parameters {
vector<lower=0>[N] sigma_2;
vector<lower=0>[N] L;

...

mu_2 = f1(X, theta);
sigma_2 = f2(mu_2)
mu_1 = f3(L);
}

model {
...
L ~ normal(mu2, sqrt(mu2));
target += neg_binomial_2_lpmf(mu1, phi1);
}

There are three options how I can compute/sample L:

  1. Assume that L = \mu_2, i.e., no sampling, I just take the estimated mean.
  2. Sample L with \sigma_2 = \sqrt{\mu_2}, i.e., no over-dispersion (\rightarrow Poisson).
  3. Sample L with \sigma_2 = \sqrt{\mu_2 * (1 + \mu_2 / \phi_2)}, i.e., with over-dispersion (\rightarrow Negative Binomial).

I steadily extended the model to see what I can estimate and what might be “too much”. Yet, I encountered two issues that I found weird and cannot explain myself:

1. Issue: Option 3. is sampling well within an hour but option 2. is sampling forever so that I have so far always stopped sampling after an hour without progress. Originally, I would have thought that option 3. is sampling longer because it introduces an additional parameter that may not be feasible to estimate. Is that generally a false assumption and it depends on the whole data and model whether 2. or 3. is sampling faster?
2. Issue: I get quite different results for my parameters of interest \theta when doing option 1. instead of 3., so I wanted to make a numb check if implementation of option 3. is correct, although I am pretty sure it is. For that, I set \sigma_2 = 0.00001. Since \mu_2 can become quite large, this model should give the same results as the implementation of option 1… Yet, somehow, the model for \sigma_2 = 0.00001 samples forever (also had to stop it). Why is that? Does it have to do something with sampling of L being too constrained when \sigma_2 is this low?

I would be very happy if you could help me. Hopefully, I am overlooking something stupid.

Many thanks in advance.

Best,
Nic

Sorry for the slow response!

Yeah, it can happen that the model with more parameters samples faster.

A different example of this happens in the horseshoe prior (https://arxiv.org/pdf/1707.01694.pdf) where there’s an alternative parameterization based on scale mixture of normals that uses a bunch more parameters than just sampling the underlying distribution itself (Look at appendix C.1 vs. C.2).

Since both model 1 and 3 are approximations, I think what you want to do is generate data from a known non-approximate model (where you aren’t doing the continuous approximation) and then check how well your inferences work with approximate models 1 and 3.

I don’t think it matters so much how well model 3 can approximate model 1 (though the small sigma probably could mess up sampling), what matters is they approximate your actually problem in some way. You’ll probably be able to still find problems this way (if model 1 fits and model 3 doesn’t, that will be suspicious).

Thank you very much, that helps and I will further check out the example with the horseshoe prior!

Regarding:

generate data from a known non-approximate model (where you aren’t doing the continuous approximation)

How would I be able to generate data from the non-approximate model? The reason I am doing the approximation in the first place is that I cannot sample the latent variable from a discrete distribution.

If you assume values for all the parameters, then no need for MCMC. Do all the sampling in R (rnbinom and friends)!