Fitting a Beta mixture model

foobar · May 1, 2025, 2:06am

Hello,

I am trying to fit a Beta mixture model as shown below. I reparameterized the beta distribution in terms of \phi and \lambda and ordered both \phi and \lambda to avoid label switching. In the diagnostics (beta_mixture_log.txt (1.2 KB)), several parameters have an R-hat value greater than 1.01 and the log recommends more effective parameterization. I would appreciate any suggestions in this regard. Thank you!

Input data
beta_mixture_data.txt (9.9 KB)

data {
    int<lower=1> N;
    int<lower=1> K;
    array[N] real<lower=0, upper=1> y;
}

parameters {
    simplex[K] theta;
    ordered[K] phi_raw;
    ordered[K] lambda_raw;
}

transformed parameters {
    array[K] real<lower=50, upper=100> lambda;
    array[K] real<lower=1e-3, upper=1 - 1e-3> phi;
    array[K] real<lower=0> alpha;
    array[K] real<lower=0> beta;

    for (k in 1:K) {
        phi[k] = 1e-3 + (1 - 2e-3) * inv_logit(phi_raw[k]);
        lambda[k] = 50 + 50 * inv_logit(lambda_raw[k]);
        alpha[k] = lambda[k] * phi[k];
        beta[k]  = lambda[k] * (1 - phi[k]);
    }
}

model {
    vector[K] log_theta = log(theta);

    //Priors
    theta ~ dirichlet(rep_vector(2.0, K));
    phi_raw ~ normal(0, 2);
    lambda_raw ~ normal(0, 1);

    for (n in 1:N) {
        vector[K] lps = log_theta;
        for (k in 1:K) {
            lps[k] += beta_lpdf(y[n] | alpha[k], beta[k]);
        }
        target += log_sum_exp(lps);
    }
}

generated quantities {
    array[N] int<lower=1, upper=K> component_assignment;
    array[N, K] real log_prob_component;

    for (n in 1:N) {
        vector[K] lps = log(theta);
        for (k in 1:K) {
            lps[k] += beta_lpdf(y[n] | alpha[k], beta[k]);
        }

        vector[K] probs = softmax(lps);

        for (k in 1:K) {
            log_prob_component[n, k] = lps[k] - log_sum_exp(lps);
        }

        component_assignment[n] = categorical_rng(probs);
    }
}

Bob_Carpenter · May 2, 2025, 10:16pm

Are you running into problems with standard <lower=0, upper=1> declarations? That is, does phi_raw somehow devolve to a really small or really large number prone to overflow/underflow of inv_logit?

You don’t need to do the beta reparameterization manually and reconstruct alpha and beta. You can just write

lps[k] += beta_proportion_lpdf(y[n] | phi[k], lambda[k]);

There’s a lot you can do to make things more efficient, e.g., replacing the loop over K with the much more efficient version that does it collectively with only one call to log_sum_exp:

log_prob_component[n] = lps - log_sum_exp(lps);

The biggest source of inefficiency in Stan programs is redundant calculations.

With your categorical_rng(probs), what is your goal? If you just want to calculate means, you can use probs to calculate the expectation of component_assignment.

I didn’t really understand what was going on with the coefficients and ranges like 5o to 100. We generally discourage this kind of hard boundary as you typically get probability mass bunching up at the boundaries, which then biases the result relative to just having an unconstrained lambda.

LucC · May 3, 2025, 9:54am

To avoid label switching, you only need to order one of the two (where I would declare the mean as an ordered vector). Currently, you are forcing the model to combine the lowest lambda with the lowest phi and vice versa, which is not necessarily the case, right? As long as the mean vector (phi) is ordered, it should be possible to identify and estimate the level of concentration (lambda) associated with each mean.

foobar · May 5, 2025, 6:34pm

Thank you both for your suggestions!

The issue was that label switching was not affecting posterior inference but affecting the convergence diagnostics. For instance, R-hat values looks good for a single chain. This was previously discussed in this post. In that post, it was suggested that post-hoc relabelling would not be appropriate if uncertainties between components are overlapping. I’d welcome any other suggestions if there are any other methods to estimate convergence across all 4 chains!

I removed the ordering on the \lambda as suggested!

I added the hard constraint simply to get convergence (even with one chain). I will remove the hard constraints and use stronger priors instead.

Bob_Carpenter · May 7, 2025, 9:44pm

Unless you can identify the model, standard convergence monitoring of parameter mean convergence with R-hat will not work.

What will work is modeling convergence of quantities of interest, like posterior predictive quantities. Those typically marginalize out the mixture component. I wrote a bit about this in the User’s Guide chapter on mixtures.

Topic		Replies	Views
Beta binomial mixture model Modeling mixture	5	1303	March 4, 2021
Loglikelihood in bivariate beta mixture model with copula Modeling rstan , loo , hierarchical-model , mixture	5	109	January 20, 2025
Beta_binomial fitting issues Modeling	6	494	October 17, 2020
Mixture Model Fitting Modeling mixture	6	802	May 31, 2020
Problem with beta distribution Modeling	4	505	August 6, 2020

Fitting a Beta mixture model

Related topics