Efficiency and failed initialization for Gaussian mixture model

In the <Stan Modeling Language: User’s Guide and Reference Manual>, the Gaussian mixture code is in page 116

data {
int<lower=1> K; // number of mixture components
int<lower=1> N; // number of data points
real y[N]; // observations
}
parameters {
simplex[K] theta; // mixing proportions
real mu[K]; // locations of mixture components
real<lower=0> sigma[K]; // scales of mixture components
}

model {
real ps[K]; // temp for log component densities
sigma ~ cauchy(0, 2.5);
mu ~ normal(0, 10);
for (n in 1:N) {
for (k in 1:K) {
ps[k] = log(theta[k])

  • normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    target += log_sum_exp(ps);
    }
    }

I slightly adjusted this code for my own goal, especially adding Dirichlet prior onto w, with the following code.

data {
int<lower=1> C; // number of mixture components
int<lower=1> V; // number of data points
real s[V]; // observations, from textbook
}
parameters {
simplex[C] w; // mixing proportions
real mu[C]; // locations of mixture components, from textbook
real<lower=0> sigma[C]; // scales of mixture components, from textbook
}
model {
vector[C] ps; // temp for log component densities, added by Ziyi
vector[V] lp_obs; //added by Ziyi
w ~ dirichlet(rep_vector(2,C)); //Added by Ziyi
mu ~ normal(0, 10);
sigma ~ cauchy(0, 2.5);

for (n in 1:V) {
for (k in 1:C) {
ps[k] = log(w[k])+ normal_lpdf(s[n] | mu[k], sigma[k]);
}
lp_obs[n]=log_sum_exp(ps);
}
target+=sum(lp_obs); //added by Ziyi
}

Even with
iter=2000, warmup=1000, chains=1, cores=1, control=list(adapt_delta=0.9)

it takes 7198.78s for the 2000 iterations. I remember when I use random walk proposal in R without C++, each iteration won’t exceed 1s. This code is slightly adjusted from manual. I used lp_obs and sum up together, which should be faster than the code from manual (see manual page 239, “Vectorizing Summations”)

How can we make it faster?

If I use Stan in Julia or Python, will it be faster than to use Stan in R?

BTW, when I change "w ~ dirichlet(rep_vector(2,C)); " into "w ~ dirichlet(rep_vector(1/C,C)); " to encourage sparsity in w, then the initialization fails.

Thanks.
Ziyi

Later I found warmup is a big issue:

7198.78s for the 2000 iterations, warmup=1000
71.42s for the 2000 iterations, warmup=10
25.67s for the 2000 iterations, warmup=0

So now I wanna know how to solve failed initialization after changing
w~Dirichlet(2, C) to w~Dirichlet(1/C, C), for C=10 Gaussian mixture components.

Later I found 1/C for integer C is rounded to 0, so I used 1.0/(C+0.0) and now it can run.