Long sampling for normal parameter with low sigma or no over-dispersion

nbanho · October 14, 2020, 10:53am

Hi there,

I have a model with a positive and discrete target variable Y \sim NegativeBinomial(\mu_1, \phi_1) and a latent variable L \sim Normal^{+}(\mu_2, \sigma_2). L is also positive and discrete, so I approximate it with a normal distribution as I cannot sample discrete parameters.

The Stan model is more complex but overall looks like this:

data {
int<lower=1> N; // number of observations
int<lower=0> Y; // target variable
matrix[M,N] X; // predictors
...
}

parameters {
vector<lower=0>[N] mu_1; 
vector<lower=0>[N] mu_2;
vector<lower=0>[N] phi_1;
vector[M] theta;
...
}

transformed parameters {
vector<lower=0>[N] sigma_2;
vector<lower=0>[N] L;

...

mu_2 = f1(X, theta);
sigma_2 = f2(mu_2)
mu_1 = f3(L);
}

model {
...
L ~ normal(mu2, sqrt(mu2));
target += neg_binomial_2_lpmf(mu1, phi1);
}

There are three options how I can compute/sample L:

Assume that L = \mu_2, i.e., no sampling, I just take the estimated mean.
Sample L with \sigma_2 = \sqrt{\mu_2}, i.e., no over-dispersion (\rightarrow Poisson).
Sample L with \sigma_2 = \sqrt{\mu_2 * (1 + \mu_2 / \phi_2)}, i.e., with over-dispersion (\rightarrow Negative Binomial).

I steadily extended the model to see what I can estimate and what might be “too much”. Yet, I encountered two issues that I found weird and cannot explain myself:

1. Issue: Option 3. is sampling well within an hour but option 2. is sampling forever so that I have so far always stopped sampling after an hour without progress. Originally, I would have thought that option 3. is sampling longer because it introduces an additional parameter that may not be feasible to estimate. Is that generally a false assumption and it depends on the whole data and model whether 2. or 3. is sampling faster?
2. Issue: I get quite different results for my parameters of interest \theta when doing option 1. instead of 3., so I wanted to make a numb check if implementation of option 3. is correct, although I am pretty sure it is. For that, I set \sigma_2 = 0.00001. Since \mu_2 can become quite large, this model should give the same results as the implementation of option 1… Yet, somehow, the model for \sigma_2 = 0.00001 samples forever (also had to stop it). Why is that? Does it have to do something with sampling of L being too constrained when \sigma_2 is this low?

I would be very happy if you could help me. Hopefully, I am overlooking something stupid.

Many thanks in advance.

Best,
Nic

bbbales2 · October 20, 2020, 5:03pm

Sorry for the slow response!

Yeah, it can happen that the model with more parameters samples faster.

A different example of this happens in the horseshoe prior (https://arxiv.org/pdf/1707.01694.pdf) where there’s an alternative parameterization based on scale mixture of normals that uses a bunch more parameters than just sampling the underlying distribution itself (Look at appendix C.1 vs. C.2).

Since both model 1 and 3 are approximations, I think what you want to do is generate data from a known non-approximate model (where you aren’t doing the continuous approximation) and then check how well your inferences work with approximate models 1 and 3.

I don’t think it matters so much how well model 3 can approximate model 1 (though the small sigma probably could mess up sampling), what matters is they approximate your actually problem in some way. You’ll probably be able to still find problems this way (if model 1 fits and model 3 doesn’t, that will be suspicious).

nbanho · October 21, 2020, 8:45am

Thank you very much, that helps and I will further check out the example with the horseshoe prior!

Regarding:

generate data from a known non-approximate model (where you aren’t doing the continuous approximation)

How would I be able to generate data from the non-approximate model? The reason I am doing the approximation in the first place is that I cannot sample the latent variable from a discrete distribution.

bbbales2 · October 21, 2020, 11:38am

If you assume values for all the parameters, then no need for MCMC. Do all the sampling in R (rnbinom and friends)!

Topic		Replies	Views
Prior for over-dispersion parameter in neg_binomial_2_log_lpmf Modeling	1	1437	November 22, 2018
Overwriting of variable - correct way? Modeling	3	320	July 26, 2022
Sampling from a transformed parameter with +=target Modeling	4	531	April 6, 2019
Latent variables, divergent transitions and prior selection Modeling rstan , techniques , fitting-issues	22	1422	September 19, 2021
Poisson Likelihood with normal priors Modeling poisson	7	1120	December 13, 2020

Long sampling for normal parameter with low sigma or no over-dispersion

Related topics