Dear all, I’m a new entry of probabilistic programming community and I’m trying to fit in a proper way a weird distribution of data that looks like this:
Because of this “bimodal” and strictly positive shape, I thought to use a mixture of an exponential for the left peak and a gamma for the right one. Specifically, my stan code is the following:
data {
int<lower=0> N;
real<lower=0> y[N];
}
parameters {
real<lower=0> phi;
real<lower=0> mu;
real<lower=0> lambda; \\ rate of exponential distribution
real<lower=0, upper=1> theta; \\ param of the mixture
}
transformed parameters {
real<lower=0> alpha; \\ shape of gamma distribution
real<lower=0> beta; \\ rate of gamma distribution
\\ reparametrization of alpha and beta, parameters of gamma distribution
alpha < mu .* mu / phi;
beta < mu / phi;
}
model {
mu ~ normal(10, 10);
phi ~ normal(10, 10);
lambda ~ normal(10, 10);
theta ~ beta(1, 1);
for (n in 1:N) {
target += log_sum_exp(
log(theta) + exponential_lpdf(y[n]  lambda),
log1m(theta) + gamma_lpdf(y[n]  alpha, beta));
}
}
'''
Unfortunately, the inference was not very good. Rhats of every parameter indicate that the chains very likely have not mixed, so I guess that it is a conceptual problem. Someone can help me and indicate me what is not ok with this model? Thank you in advance!
Try a soft ordering of the exponential and gamma by using stronger priors so that the mean of the exponential is approximately near 23 (lambda ~ normal(0.5, 1)) and the mean of the gamma is near 1518.
Just from a visual inspection I’d try beta near 0.5 and alpha around 89 so
using your parameterization mu ~ normal(15, 2) and phi ~ normal(30, 3).
Also remove the <
 those are deprecated  and remove the elementwise multiplication. It’s also a bit more efficient to assign directly to alpha and beta in the declaration.
transformed parameters {
real<lower=0> alpha = square(mu) / phi; \\ shape of gamma distribution
real<lower=0> beta = mu / phi; \\ rate of gamma distribution
}
In addition, check out Michael Betancourt’s case study on identifying mixtures https://mcstan.org/users/documentation/casestudies/identifying_mixture_models.html.
Thank you spinkney for your kind answer. First of all, I am not able to make an assignment to alpha and beta in the declaration, it returns me the following error
libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument
Anyway, I think that this is a secondary detail. I tried to follow your suggestions for priors, but chain mixing does not seem to improve. Here I report the fit summary table:

mean 
se_mean 
sd 
2.5% 
25% 
50% 
75% 
97.5% 
n_eff 
Rhat 
phi 
27.313887 
11.901104 
16.866117 
0.002500 
9.786976 
34.590843 
41.179883 
43.488963 
2.008426 
14.974326 
mu 
11.074136 
4.489970 
6.351390 
0.069233 
6.359057 
14.459533 
14.865302 
15.485589 
2.001017 
53.741367 
lambda 
0.654969 
0.396614 
0.566157 
0.073242 
0.087720 
0.474994 
1.212746 
1.415739 
2.037693 
7.031650 
theta 
0.390532 
0.245998 
0.348067 
0.108433 
0.119606 
0.215776 
0.708631 
0.975922 
2.001998 
33.756432 
alpha 
5.126093 
1.781273 
2.534903 
1.212825 
3.970319 
5.075746 
6.611532 
9.330145 
2.025171 
8.585616 
beta 
5.029049 
5.719404 
8.716314 
0.333556 
0.351200 
0.434583 
2.844751 
28.439209 
2.322545 
2.623867 
lp__ 
9148.808401 
232.751151 
329.195081 
9718.405121 
9400.218774 
8974.716057 
8934.799591 
8933.446004 
2.000428 
252.099822 
As you can see it is pretty depressing. If you are so kind as to try playing with those data, I leave them here: data.csv (60.9 KB)
I also tried to use different distributions to in the mixture, but little changes. Thank you again!
Even with pretty strong priors this model shows bimodality
If you pick one of the modes and constrain the parameters to be inside it’ll work. Though I’m not sure it’s “satisfying” since you’re basically saying what it should be.
Anyway, here’s the stan code
data {
int<lower=0> N;
real<lower=0> y[N];
}
parameters {
real<lower=7> alpha;
real<lower=0, upper=1> beta;
real<lower=0, upper=0.3> lambda; // rate of exponential distribution
real<lower=0, upper=1> theta; // param of the mixture
}
model {
alpha ~ normal(8, 0.1);
beta ~ normal(0.5, 0.1);
lambda ~ normal(0.1, 0.3);
theta ~ beta(1, 1);
for (n in 1:N) {
target += log_sum_exp(
log(theta) + exponential_lpdf(y[n]  lambda),
log1m(theta) + gamma_lpdf(y[n]  alpha, beta));
}
}
and the fit
Inference for Stan model: forum_mix.
4 chains, each with iter=1000; warmup=500; thin=1;
postwarmup draws per chain=500, total postwarmup draws=2000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha 8.04 0.00 0.10 7.86 7.98 8.04 8.11 8.23 1104 1
beta 0.53 0.00 0.01 0.51 0.52 0.53 0.53 0.54 1043 1
lambda 0.12 0.00 0.01 0.11 0.12 0.12 0.12 0.14 804 1
theta 0.34 0.00 0.02 0.31 0.33 0.34 0.35 0.37 1003 1
lp__ 9017.67 0.05 1.46 9021.60 9018.31 9017.37 9016.63 9015.87 747 1
Samples were drawn using NUTS(diag_e) at Fri Oct 02 10:25:14 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
and the pairs plot now
1 Like
I appreciate you help a lot! This is a good starting point, I will try to relax the priors in order to make the model more flexible, but at least now I understood how to reason in presence of bimodal distributions. Thank you again!
Reminds me of the crosssections of the CoxIngersollRoss diffusion. See figures 4 and 5 at https://www.openphilanthropy.org/sites/default/files/Modelingthehumantrajectory.pdf#page=47. In figure 5, middle row, right column, vertical crosssections look rather like your distribution.