Standardizing data for mixture model

jacobcvt12 · May 29, 2020, 2:07am

I’m fitting a univariate gaussian mixture model to a dataset that has relatively high means (100+).

set.seed(1)
n <- 10000
mu <- c(120, 140, 160)
sigma <- c(4, 3, 5)
alpha <- c(0.2, 0.5, 0.3)
K <- length(alpha)

z <- sample(length(alpha), n, replace=TRUE, prob=alpha)
y <- rnorm(n, mu[z], sigma[z])

My understanding of Stan and from experimentation is that standardizing the data (transforming to have mean 0 and std dev 1) will increase the efficiency of the sampler. However, I’m stuck at transforming the priors. I have fairly strong priors on component standard deviations (parameterized in terms of mean and std dev and transformed to shape and rate). In the current model, it looks something like this:

data {
    // data
    int N; // number observations
    int K; // number clusters
    vector[N] y; // count of fragments of size n

    // priors
    vector[K] mu0;
    real sigma0;

    vector<lower=0>[K] sigma_mu;
    vector<lower=0>[K] sigma_gamma;

    vector<lower=0>[K] alpha;
}

transformed data{
    vector<lower=0>[K] alpha1;
    vector<lower=0>[K] beta1;

    for (k in 1:K) {
        alpha1[k] = pow(sigma_mu[k], 2) / pow(sigma_gamma[k], 2);
        beta1[k] = sigma_mu[k] / pow(sigma_gamma[k], 2);
    }
}

parameters {
    vector[K] mu;
    vector<lower=0>[K] sigma;
    simplex[K] theta;
}

model {
    real contributions[K];

    // prior
    mu ~ normal(mu0, sigma0);
    sigma ~ gamma(alpha1, beta1);
    theta ~ dirichlet(alpha);

    for (n in 1:N) {
        for (k in 1:K) {
            contributions[k] = log(theta[k]) + normal_lpdf(y[n] | mu[k], sigma[k]);
        }

        target += log_sum_exp(contributions);
    }
}

However, if I create y_std = (y - mean(y)) / sd(y) how would I modify sigma_mu and sigma_gamma?

hhau · May 29, 2020, 1:38pm

Maybe! This is not universally true - just mostly.

I don’t think it’s obvious what exactly you should do – my best guess would be to divide both by sd(y), but it’s not clear to me what the rescaling is doing to the scales of the components.

Maybe a better alternative would be to initialise the chains by sampling the priors for \alpha, \beta (involves moving the transformed data block outside of the stan code), and using your preferred initialisation scheme for the other parameters. The sampler adaptation should produce in pretty high effective sample sizes.

Shouldn’t mu here be an ordered vector? I think you might also be able to declare it using the new offset syntax discussed here (I don’t know if you can declare parameters as ordered <offset = 100> [K] mu; but it might be worth trying).

hhau · May 29, 2020, 3:09pm

Another option would be to simulate data with distinct components (as you do), rescale the data, and compute the standard deviation for each component. Doing this for many (components | standard deviations) will tell you (empirically) what transformation is being applied, and how to rescale your prior.

Topic		Replies	Views
Bayesian Mixture Model Modeling pystan , rstan , fitting-issues , stan , mixture	2	582	December 29, 2022
Efficiency and failed initialization for Gaussian mixture model Modeling	2	483	November 1, 2018
Re-scaling data and parameters to [0, 1] Modeling	7	2510	August 3, 2020
Changing prior from Gaussian to Student's t Modeling specification	12	851	October 24, 2019
Mixture Models for sets of observations Modeling	3	653	May 8, 2020

Standardizing data for mixture model

Related topics