Problem with sampling in a variation of skellam (Rstan)

Hmm, any chance you can roll with this? Either that or you’re going to have to figure out what is blowing up in the normalization. It probably has to do with the Bessel function in the Skellam lpmf. Those are tricky.

Probably the best way to dive further is to try to evaluate the normalization in R for a bunch of the samples from the diverging model and see what it looks like (to confirm it’s the issue).

Finally, this version of model for truncation in [-5.5] is sampling without problem. I observed that for [-4,4] the divergent transitions have fallen dramatically from 2000 to 140. Thus, for truncation in [-5,5] there are not divergent transitions. However, my target is to sample from a truncation in [-3,3] instead of [-5,5]. Any suggestion after “good” news?

That is an interesting find. I’m afraid I do not know the answer here. 140 divergences is still really bad. You really want to have zero divergences even with a couple thousand posterior samples. Maybe a couple is bearable, but it’s a sign things are going south.

Can you look at your posteriors on lambda1 and lambda2 and see where the divergences are showing up? Is there a dramatic difference in the values of the posterior samples of these two variables between the [-5, 5] and [-3, 3] versions of the code?

I think that I know now the source of problem. I tried to construct my MCMC random walk Metropolis Hastings (works well without problems-checked) in order to find the exact cause of the problem. The conclusion I saw was that in some games we had large difference of \lambda_1-\lambda_2 (e.g. \lambda_1 = 4123.5 and \lambda2= 123.5 ) so the dskellam(y,\lambda_1, \lambda_2)=0 and the log density is equal to -Inf. As a result of this, the normalization term was -Inf. Have you got any idea why this happens to [-3,3]? I suppose that this may depends on the geometry of distribution but I cannot totally understand why it happens.

P.S. 1) The trial for truncation in [-5,5] was implemented also in my MCMC random walk Metropolis and gave me almost the same results with Stan.

Haha, you’re the expert here. If it helps, it sounds to me like you’re doing the right thing but I don’t have much advice beyond that.

Is the dskellam really zero under those conditions? Or is it just really really small? Is this an issue with computing probabilities on the non-logged scale? Is there a way to do them on the log scale?

Maybe there is a way to put constraints or priors on parameters to prevent the \lambda s from drifting off? Priors would be preferable to constraints. There’s probably a reason the sampler is going there, and it’s always nice to try to figure out why rather than just stonewalling it.

I have solved the issue mentioned in this topic but I would want to execute another task which includes constraint on parameters \lambda_{1} and \lambda_{2}.

Maybe there is a way to put constraints or priors on parameters to prevent the
λ
s from drifting off?

I would like to prevent my parameters from taking values bigger than 500 without changing my priors. In other words, I would like to use step or operator functions or if-else conditionals in parameters block but I have read in the Manual that we cannot put constraints on parameter block (in Winbugs this can be executed). Is there any solution in Stan?

Thanks in advance!

So this code constrains the variable lambda to between zero and five hundred. Is that what you’re looking for?

real<lower = 0.0, upper = 500.0> lambda;

Not exactly, because through your suggestion I have also divergent transitions (when parameter exceeds the upper bound). I would like to give in \lambda the value 500 when this value is greater than 500 (somewhat mixture). Could I define it in a way that allows me to implement this idea?

Oh, if these are transformed parameters, then putting constraints on them simply checks the constraints, not enforces them. Transformed parameters and parameters are handled differently.

Probably the easiest thing to do here is add a link function of some sort. In the same way that inv_logit can be used to map (-inf, inf) to (0, 1), you could also map (-inf, inf) to (0, 500), or something. Or just pick and choose another link function if you only have one sided limits.

That might muddy the waters on what your priors mean, but maybe it helps. Why not tighten priors though?

Tightern priors may help but with large number of parameters (43 in total for \lambda_{1},\lambda_{2}) this is very difficult. This model runs for priors such as normal (0,sd=0.2) but they are much informative.

Oh okay. If this is a situation where a few parameters are big but some are small, have you looked at a sparsity prior: https://arxiv.org/abs/1707.01694 ? Picking the hyperparameters can seem a bit daunting, but last time I used a horseshoe my model wasn’t that sensitive to them. It was straightforward, if tedius, to just change them around and see what happened.

Finally, I found a solution. Through a “if” loop I have put a constraint on my model parameters (for values above a limit, I have forced these parameters to this limit). However it works, the narrowing of priors at least to some extent is appropriate so that the posterior summaries are meaningfull.