Mixture formula of discrete and continuous distributions


#1

Hi, I’m a beginner of Stan, and facing the following problem.
I want to set alpha[i] to inv_gamma(a, b) or just 1 depending on the probability theta, i.e.,
alpha[i] ~ inv_gamma(a, b) (with the probability theta) or
= 1 (with the probability 1-theta),
but I don’t know how to implement.
According to Chap.12.7 in the manual, there seems to be no possible method, but do you know any suitable methods ?
I would be happy if you give me some advice.
Here, alpha is a latent parameter and I don’t have direct data on it, so it is difficult to implement … If you need additional information, please let me know.
Actually, at first, I tried introducing another latent integer parameter, but I noticed it is impossible in Stan, and my question arose on the course of marginalization.


#2

You cannot assign the result of inv_gamma to anything. One of the many things you need to know about Stan before attempting to write anything in the Stan language is that the model block just builds up a log-kernel of the posterior PDF with target += statements or almost equivalently statements involving a ~. But I think what you are saying may be equivalent to a mixture model where alpha has an inverse gamma prior with probability theta and is fixed to 1 otherwise. In that case, you are going to have to use the log_mix function to handle the two possibilities.


#3

Thank you very much for your reply. I didn’t understand how to use log_mix function. Now I write the following, and my code starts sampling.
target += log_mix(theta, inv_gamma_lpdf(alpha[i] | a, b), alpha[i] == 1);
Would it mean that alpha[i] = 1 with probability 1 - theta ?


#4

Think of it as having two posterior distributions, one where theta is positive and another where theta is exactly 1. Then build the log-kernel separately in target_1 and target_2 respectively. Then target += log_mix(theta, target_1, target_2);.


#5

Thank you very much for your quick reply. At first I tried writing as follows:
target += log_mix(theta, inv_gamma_lpdf(alpha[i] | a, b), bernoulli_lpmf(alpha[i] | 1.0));.
But I already stated “vector[Tnum] alpha;” in parameters block, so cannot apply the Bernoulli distribution to alpha[i]. Do you mean that there are some other functions to produce 1 with the probability 1 ?


#6

Yes, just put 1 wherever you put alpha[i].


#7

Thank you so much for your comment. However, I don’t find a solution after considering your comment. I need your help again.
For reference, I provide my code that is simplified as follows:

data {

vector<lower=0>[Tnum] t;

}

parameters {

vector<lower=0>[Tnum] theta;
vector<lower=1>[Tnum] alpha;
}

transformed parameters {

vector[Tnum] a;
vector[Tnum] omega;
vector[Tnum] nu;

for(i in 1:Tnum){
a[i] = function(t);
omega[i] = function(a[i], alpha[i-1], omega[i-1], …);
nu[i] = function(a[i], alpha[i-1], nu[i-1], …);
}

}

model {

target += beta_lpdf(theta | beta1, beta2);
for(i in 1:Tnum) {
target += log_mix(theta[i], inv_gamma_lpdf(alpha[i] | a[i], b[i]), 1); //(**)
}

}

I think the prior distribution (**) represents what I want to implement, but there is still the lack of information that alpha[i] = 1 with the probability 1 - theta[i].
Do you mean that I may put 1 in alpha[i] in the transformed parameters block, and remove alpha[i] in parameters block ?


#8

No. I mean your model block should be something like

model {
  for (i in 1:Tnum) {
    real target_1;  // log-likelihood assuming alpha[i] != 1
    real target_2;  // log-likelihood assuming alpha[i] == 1
    // define target_1 and target_2
    target += log_mix(theta[i], target_1, target_2);
  }
  // priors
}

#9

Thank you very much. I’m convinced by your explanation, and tried classifying the likelihood, but found the number of the classification is 2^{Tnum-1} … This is because the classification of the likelihood on time “t” depends on the past observational data (I use time-series data). In my case, it may not be straightforward to write down the marginalization. There are some cases where mathematical formula are valid, but now I don’t find any useful methods on my model. Anyway, I’m studying on Stan, and looking for some other methods.


#10

The usual way to do that is dynamic programming. There’s an example for HMMs in the mixtures or time series chapter of the manual (can’t remember which as it’s both) and also for the CJS population models in the marginalizing discrete parameters chapter.