# Fitting a mixture of two von Mises distribution

I would like to fit a mixture of two von Mises distribution to data points with two peaks at 0 and \pi. Below is the complete script that I have written for this purpose:

library(rstan)
rstan_options(auto_write = TRUE)

model_code <- "
data {
int<lower=0> N;
vector[N] y;
}
parameters {
simplex theta;
}
model {
real ps;
real kappa;
real mu;
for (n in 1:N) {
if (kappa < 100) {
ps = log(theta) + von_mises_lpdf(y[n] | mu, kappa);
}
else  {
ps = log(theta) + normal_lpdf(y[n] | mu, sqrt(1/kappa));
}
if (kappa < 100) {
ps = log(theta) + von_mises_lpdf(y[n] | mu, kappa);
}
else {
ps = log(theta) + normal_lpdf(y[n] | mu, sqrt(1/kappa));
}
target += log_sum_exp(ps);
}
}"

N <- 200
weight <- 0.7
y1 <- rep(0, N*weight)
y2 <- rep(pi, N*(1-weight))
Y <- c(y1, y2)

model <- stan_model(model_code = model_code, allow_undefined = TRUE,
verbose = TRUE)
m <- sampling(model, data = list(N=N, y=Y), chains = 1)

m_summary <- summary(m)
vals <- m_summary$summary[, 'mean'] kappa1 <- exp(vals['kappa_link']) kappa2 <- exp(vals['kappa_link']) mu1 <- 2*atan(vals['mu_link']) mu2 <- 2*atan(vals['mu_link']) print(c(kappa1, kappa2, mu1, mu2))  At the end of the sampling run, I am getting the following warning messages: 1: There were 971 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 2: Examine the pairs() plot to diagnose sampling problems 3: The largest R-hat is 1.93, indicating chains have not mixed. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#r-hat 4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess 5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. I have looked at the provided links above and increased the max_treedepth to above 10 but I am not getting any improvements. I am unsure what are the steps that I can take to improve the model. I appreciate any help to move forward. Do you know for certain that the two distributions are centered on 0 and pi? If so, you should get rid of the mu parameter and use those values for the locations instead. If you’re more expressing that you’re pretty confident the locations are 0 & pi, then keep mu but use a prior to convey that confidence. Right now your prior on the two locations implies that they can both be pretty-much anywhere in the parameter space, which I suspect is leading to label switching issues. 1 Like Also, I suggest using a single variable for the probability parameter rather than a simplex; the former permits easier specification of priors which in turn could help your model if you indeed have much in the way of expectations for the ratio of the two classes. 1 Like Thanks Mike for kindly looking into my problem. I know that the two peaks are likely to be near 0 and \pi. To keep \mu and convey that confidence, is this how I should do it using a prior:  mu = 2*atan(mu_link); mu = 2*atan(mu_link) + pi();  For the probability parameter theta, say for example that I know that it is in the vicinity of 0.7, is this a good way to set the model with prior: parameters { real theta; } model { theta ~ normal(0.7, 0.5); }  How can I ensure that theta is in the range [0,1]? Thanks! Here’s what I’d recommend you try: data { int<lower=0> N; vector[N] y; } parameters { real theta_log_odds; vector kappa_link; vector mu; } model { real ps; real theta = inv_logit(theta_log_odds) ; vector kappa = exp(kappa_link); \\I think exp() is vectorized now kappa_link ~ normal(0,2); mu ~ von_mises(0, 30) ; \\ check that this matches your knowledge mu ~ von_mises(pi, 30) ; \\ check that this matches your knowledge theta_log_odds ~ normal(1,1) ; \\ check that this matches your knowledge for (n in 1:N) { if (kappa < 100) { ps = log(theta) + von_mises_lpdf(y[n] | mu, kappa); } else { ps = log(theta) + normal_lpdf(y[n] | mu, sqrt(1/kappa)); } if (kappa < 100) { ps = log1m(theta) + von_mises_lpdf(y[n] | mu, kappa); } else { ps = log1m(theta) + normal_lpdf(y[n] | mu, sqrt(1/kappa)); } target += log_sum_exp(ps); } }  2 Likes Thanks Mike for your suggestion. It is a nice idea to use von_mises instead of normal to get the priors of mu. I have tried it with the complete script below, but I am still getting unexpected results: kappa1: 6.418781e-02 kappa2: 1.153151e+12 mu1: -5.635822e-01 mu2: 3.680773e-09 theta: 3.057110e-01 Do you have any other ideas? library(rstan) rstan_options(auto_write = TRUE) model_code <- " data { int<lower=0> N; vector[N] y; } parameters { real theta_log_odds; vector kappa_link; vector mu; } model { real ps; real theta = inv_logit(theta_log_odds); vector kappa = exp(kappa_link); kappa_link ~ normal(0,2); mu ~ von_mises(0, 30); mu ~ von_mises(pi(), 30); theta_log_odds ~ normal(1,1); for (n in 1:N) { if (kappa < 100) { ps = log(theta) + von_mises_lpdf(y[n] | mu, kappa); } else { ps = log(theta) + normal_lpdf(y[n] | mu, sqrt(1/kappa)); } if (kappa < 100) { ps = log1m(theta) + von_mises_lpdf(y[n] | mu, kappa); } else { ps = log1m(theta) + normal_lpdf(y[n] | mu, sqrt(1/kappa)); } target += log_sum_exp(ps); } }" N <- 200 theta <- 0.7 y1 <- rep(0, N*theta) y2 <- rep(pi, N*(1-theta)) Y <- c(y1, y2) model <- stan_model(model_code = model_code, allow_undefined = TRUE, verbose = TRUE) m <- sampling(model, data = list(N=N, y=Y), chains = 1) m_summary <- summary(m) vals <- m_summary$summary[, 'mean']