Discrete parameter in Stan

Is there any way to estimate a discrete parameter in Stan?

I’ve read the topic related to modeling discrete parameter using CONCRETE or REBAR distributions appraoch. It seems that this approach has not been compared yet with the marginalization over the discrete parameter approach, as it done in finite mixture model in Stan reference. But what if this discrete parameter is related to estimation of another continuous parameter.

I’ve tried to fit mixture IRT model using Stan. The problem is that the ability parameter (theta) for each person (i) is related to the latent class (g) that the person comes from; theta(i,g).

Estimation of this model via Stan is different than JAGS, or WINBUGS since the Gibbs sampling algorithm allows to sample first from the discrete parameter (i.e. latent class).

Is it possible, for example, to declare (g) as a categorical local variable in the model block, then sample the ability parameter theta(i,g) based on this local variable?

Any idea how to solve this issue?

Thanks a lot

1 Like

Discrete parameters need to be marginalized out in the model. They also should generally be marginalized out even if not necessary (e.g. with other software) because of the potentially large efficiency gains from marginalization.

In a Stan program, if, for example, you marginalize out a discrete parameter in the model block but still need samples of that parameter they can often be generated (e.g., using categorical_rng in generated quantities). Theres an example of obtaining samples of the discrete parameter in the change point section of the manual I think.

I got you. Then, I’ll try the generated quantities for sampling the discrete parameter.

Thank you jonah. I really appreciate your help

No problem. If you get stuck and can’t find the answer anywhere definitely follow up!

As described in the first example in the manual chapter on latent discrete parameters, you’re usually better off computing in expectation rather than sampling (the Rao-Blackwell theorem makes this formal).

What does “computing expectation” mean specifically? I truly wonder why the manual does not give out an example where the discrete parameter follows poisson need to be sampled? Is it because it’s too hard to do marginalization on such variable? Can you give me a simple example where you could model to marginalize out the discrete parameter following Poisson?

It is harder because, in principle, you have to sum over an infinity number of non-negative integers that the discrete unknown could take on. In practice, it is doable if you sum up until the probability that the discrete unknown takes on a large value becomes negligible (i.e. less than -30 or so on the log-scale).

I understand that difficulty, but I wonder if it is possible for the Stan’s developers community to provide a very simple example for the case of sampling Poisson parameter?? I did exactly the same as what you mentioned (sum up to 200+, which is big enough for my model), but I still encountered the error “rejection probability at the first stage continuously” when dealing with mixture model of Poisson prior and Normal likelihood. Is there any ways to work around this error in Stan? It’s weird to me that the Stan developers seem to ignore mixture model with the discrete parameters following Poisson, namely because Poisson is a very popular choice among distribution when we sample discrete parameter.

My guess is that your model has an additional problem.

Can you help take a careful look at it? The link to the problem is here: Sampling error: Unrecoverable error evaluating the log probability at the initial value. I am not sure where it has an additional problem, if that is the case though.

On the other hand, for assessing the quality of our assumption on the choice of the prior and likelihood, besides plotting the obtained posterior distribution versus observed values, is \hat{R} the good measure? In particular, if lots of \hat{R} > 1.05 (assume our posterior distribution is comprised of a bunch of vectors, each components of a vector are inferred results), this should not only imply MCMC not converging well within a fixed number of iterations, but also imply the poor quality of our choice for prior?

You mean using the function softmax() as explained on page 215 on Stan reference manual 2.17.0.

I was not successful in doing either, sampling or expectation in the generated quantities block. The problem of my model is that all the parameters depend/ or indexed by the discrete parameter, which is the latent class (g). In other words, each latent class (g) has its own parameters of examinee’s ability ( theta[i,g] ) and item difficulty (bets[j,g] ), as in Rasch mixture model.

So, based on other algorithms such as Gibbs sampling or Metropolis-Hastings, the first step is to sample the latent class, then sampling persons’ abilities based on the latent class they are coming from.

If I marginalize over the discrete parameter, I also should marginalize over all model parameters since they are all indexed by the latent class (g)! which means no parameters will be left to estimate.

I tried get around this problem of Discrete Parameter, to sample a real parameter, say (x) then convert it to an integer local variable (z) in the MODEL block using a function int_floor which is written by one of stan users on “Stan users mailing list”. However, it does not work. Stan gets stuck.

If you just add discrete parameter in Stan, or at least incorporate a function for converting a real to an integer parameter, that would be great! This will allow fitting a large number of models that are sticked with discrete parameters, and Stan will be very flexible program in addition to its efficiency. I’m still looking for a solution to fit my model in Stan though.
Thanks

Really the only situation where I seriously feel the pain of not having discrete parameters is when I have missing count data. Otherwise I think we can keep adding to the existing collection of examples of how to write code for marginalization. Never too many good examples for different scenarios.

2 Likes

Your approach to marginalize is OK assuming that values outside of (0, 200) have negligible probability mass. I think in this case, you can replace that Poisson with a continuous density and not bother with the marginalization.

1 Like

I know it’s an old post, but your statement about Rao-Blackwell made me very curios, could you point me towards some material that explicitly links RB theorem and efficiency/tail exploration ?
Thanks!

Sorry—been off Discourse for a while.

All of the discussion around Rao Blackwell is about efficiency. For instance, in the Robert and Casella book on Monte Carlo methods.

Relative errors being high in tail probabilities is a feature of discrete sampling—it doesn’t have anything to do with Rao Blackwell per se. If you only have a 0.001 probablity of some discrete outcomes you obviously are going to need a lot of draws to get low relative error there by sampling.

1 Like