How to specify a mixture of doubly truncated Paretos

benelson · January 12, 2018, 8:05pm

Hey all,

I have a fairly simple model I’d like to implement in Stan. It’s a mixture of doubly truncated Pareto distributions. K mixture components have their own ymin and alpha, but they share an upper truncation parameter ymax.

data {
    int<lower=1> K; // number of mixture components
    int<lower=0> N; // number of planets
    real y[N]; // y values for N planets
}
parameters {
    simplex[K] theta; // mixing proportions
    real alpha[K]; // power-law indices
    real ymin[K]; // ymin for each component
    real ymax; // ymax for all K components
}
model {
    real ps[K];
    real ylower[K];
    real yupper[K];
    real integral[K];
    
    ylower[1] = ymin[1]; 
    ylower[2] = ymin[2]; 
    yupper[1] = ymax;
    yupper[2] = ymax;
    
    alpha ~ uniform(-10.,10.); // prior on alpha
    ymin ~ uniform(0., 30.); // prior on ymin
    ymax ~ uniform(ymin[2], 30.); // prior on ymax

    integral[1] = (yupper[1]^(alpha[1]) - ylower[1]^(alpha[1]))/alpha[1];
    integral[2] = (yupper[2]^(alpha[2]) - ylower[2]^(alpha[2]))/alpha[2];

    for (n in 1:N) { // loop over planets
        for (k in 1:K){ // loop over mixture components
            if ((y[n] > ylower[k]) && (y[n] < yupper[k]))
            {
                ps[k] = theta[k] * (y[n])^(alpha[k]-1.)/integral[k];
            }
            else
            {
                ps[k] = 0.;
            }
        }     
        target += log(sum(ps));
    }
}

I realize Stan supports Pareto distributions (y ~ pareto(ymin, alpha)), but I’d like to allow for monotonically increasing distributions, i.e., a Pareto with a positive exponent, alpha<-1.

The above code works, but I doubt it’s an efficient implementation. For N=1000, it takes roughly ~0.1s per model evaluation. Any advice for speeding up this Stan model?

arya · January 14, 2018, 10:34pm

For mixture models you’re going to want to have ordered means on the different mixture distributions to make sure you don’t have the label switching identifiability (check out Betancourts notebook on this for more information). It’s possible this non-identifiability will slow you down by making you have to take more leap-frog steps. Since you’re dealing with a Pareto distribution you may want to re-parameterize so you can do an ordering. This is not as straight-forward as ordering the means of Gaussians.

As far as I know for mixture components you have to use a for loop as of now like you did. I wonder if having a likelihood that is discontinuous in the parameters is going to cause problems for HMC. If your ylower parameter changes to incorporate a new data point in the if rather than the else, that’ll cause a jump in the likelihood. What do your trace plots look like and how is your tree depth?

Bob_Carpenter · January 18, 2018, 6:57am

Yes, this can be a problem when the Hamiltonian simulator tries to cross the boundary. If the energy discrepancy is too high, it’ll just diverge (and thus get stuck on one side of the boundary).

Topic		Replies	Views
Fitting a truncated Pareto in stan Modeling	1	436	June 7, 2022
Truncated DP for p-dimensional Bernoulli mixtures Modeling	0	344	February 3, 2019
Mixture Bayesian Poisson Regression Model Modeling	45	3900	April 17, 2019
Estimating parameters of two normal distribution, mixed at proportions Modeling	5	1433	May 10, 2019
Beta binomial mixture model Modeling mixture	5	1314	March 4, 2021

How to specify a mixture of doubly truncated Paretos

Related topics