2D mixture of truncated distributions

Hi,

I’m trying to implement a mixture of 2-Dimensional truncated distributions. In one class both dimensions correspond to the same Normal, and in the other class both dimensions correspond to the same exponential. As described in the non-working code below

data {
 int <lower=0> N; // number of datapoints
 real L, U; // lower and upper limits of the observables
 array[N,2] real<lower=L, upper=U> y;  // there are 2 observables per datapoint
 real mu0, sigma0, mu1, sigma1, mu2, sigma2, t1, t2;  // hyperparameters for the parameters priors
}

parameters {
  real<lower=0> mu; 
  real<lower=0> sigma;
  real<lower=0> lambda0;
  simplex[2] theta;  
}

model {
    vector[2] lp;
    mu ~ normal(mu0, sigma0);
    sigma ~ cauchy(mu1, sigma1);
    lambda0 ~ normal(mu2, sigma2);
    theta ~ dirichlet([t1,t2]);
    
   for (n in 1:N) {
     lp[1] = normal_lpdf(y[n,1] | mu, sigma) T[L,U] + normal_lpdf(y[n,2] | mu, sigma) T[L,U];
     lp[2] = exponential_lpdf( y[n,1] | lambda0) T[L,U] + exponential_lpdf( y[n,2] | lambda0) T[L,U];     
     target += log_mix(theta, lp);                        
     };
}

I cannot make it to compile. I am naively defining a lpdf with the same notation as it is defined a sampling statement with the T[L,U] after the pdf. This is not working.

Can someone help me in how to define this model? I have the impression that the trick is to use transformed parameters, but I can’t figure out how to do it.

Thank you ! Ezequiel.

I have solved it, I leave the solution here for anyone who may find it useful!

The trick is to define the sampling PDF as the complete PDF divided the PDF area between the truncating limits. And express this as a function of the existing log-CDF and log-CCDF, and using the trick of log_diff_exp to express the log of a difference.

No one answered here, but I’m sure they would have do it after some time!

model {
    vector[2] lp;
    mu ~ normal(mu0, sigma0);
    sigma ~ cauchy(mu1, sigma1);
    lambda0 ~ normal(mu2, sigma2);
    theta ~ dirichlet([t1,t2]);
    
   for (n in 1:N) {
     lp[1] =   normal_lpdf(y[n,1] | mu, sigma)
             - log_diff_exp( normal_lcdf( U | mu, sigma), normal_lcdf( L | mu, sigma) )             
             + normal_lpdf(y[n,2] | mu, sigma)
             - log_diff_exp( normal_lcdf( U | mu, sigma), normal_lcdf( L | mu, sigma) );
     lp[2] =   exponential_lpdf( y[n,1] - L | lambda0)             
             - exponential_lcdf( U - L | lambda0) 
             + exponential_lpdf( y[n,2] - L| lambda0)
             - exponential_lcdf( U - L | lambda0);
     target += log_mix(theta, lp);                        
     };
}

Thanks. I was just coming here to answer. I thought I’d swept up all the unanswered queries a couple days ago, but it looks like I missed yours. Sorry about that.

Your answer would work even better if we had proper log cdf implementations—as is, our implementations are not super stable in the tails. We couldn’t find good log cdf function implementations anywhere, but we’re open to suggestions.

You can also avoid simplexes and Dirichlet by recoding in terms of beta distributions as

real<lower=0, upper=1> theta;
...
theta ~ beta(t1, t2);
...
target += log_mix(theta, lp[1], lp[2]);

But that won’t generalize to more than 2 mixture components.