Fitting a mixture of two von Mises distribution

I would like to fit a mixture of two von Mises distribution to data points with two peaks at 0 and \pi. Below is the complete script that I have written for this purpose:

library(rstan)
rstan_options(auto_write = TRUE)

model_code <- "
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  simplex[2] theta;
  vector[2] kappa_link;
  vector[2] mu_link;
}
model {
  real ps[2];
  real kappa[2];
  real mu[2];
  mu_link ~ normal(0,5);
  kappa_link ~ normal(0,2);
  mu[1] = 2*atan(mu_link[1]);
  mu[2] = 2*atan(mu_link[2]);
  kappa[1] = exp(kappa_link[1]);
  kappa[2] = exp(kappa_link[2]);
  for (n in 1:N) {
    if (kappa[1] < 100) {
      ps[1] = log(theta[1]) + von_mises_lpdf(y[n] | mu[1], kappa[1]);
    }
    else  {
      ps[1] = log(theta[1]) + normal_lpdf(y[n] | mu[1], sqrt(1/kappa[1]));
    }
    if (kappa[2] < 100) {
      ps[2] = log(theta[2]) + von_mises_lpdf(y[n] | mu[2], kappa[2]);
    }
    else {
      ps[2] = log(theta[2]) + normal_lpdf(y[n] | mu[2], sqrt(1/kappa[2]));
    }
    target += log_sum_exp(ps);
  }
}"

N <- 200
weight <- 0.7
y1 <- rep(0, N*weight)
y2 <- rep(pi, N*(1-weight))
Y <- c(y1, y2)

model <- stan_model(model_code = model_code, allow_undefined = TRUE,
                    verbose = TRUE)
m <- sampling(model, data = list(N=N, y=Y), chains = 1)

m_summary <- summary(m)
vals <- m_summary$summary[, 'mean'] 
kappa1 <- exp(vals['kappa_link[1]'])
kappa2 <- exp(vals['kappa_link[2]'])
mu1 <- 2*atan(vals['mu_link[1]'])
mu2 <- 2*atan(vals['mu_link[2]'])
print(c(kappa1, kappa2, mu1, mu2))

At the end of the sampling run, I am getting the following warning messages:
1: There were 971 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 2: Examine the pairs() plot to diagnose sampling problems
3: The largest R-hat is 1.93, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.

I have looked at the provided links above and increased the max_treedepth to above 10 but I am not getting any improvements. I am unsure what are the steps that I can take to improve the model. I appreciate any help to move forward.

Do you know for certain that the two distributions are centered on 0 and pi? If so, you should get rid of the mu parameter and use those values for the locations instead.

If you’re more expressing that you’re pretty confident the locations are 0 & pi, then keep mu but use a prior to convey that confidence. Right now your prior on the two locations implies that they can both be pretty-much anywhere in the parameter space, which I suspect is leading to label switching issues.

1 Like

Also, I suggest using a single variable for the probability parameter rather than a simplex; the former permits easier specification of priors which in turn could help your model if you indeed have much in the way of expectations for the ratio of the two classes.

1 Like

Thanks Mike for kindly looking into my problem. I know that the two peaks are likely to be near 0 and \pi. To keep \mu and convey that confidence, is this how I should do it using a prior:

  mu[1] = 2*atan(mu_link[1]);
  mu[2] = 2*atan(mu_link[2]) + pi();

For the probability parameter theta, say for example that I know that it is in the vicinity of 0.7, is this a good way to set the model with prior:

parameters {
  real theta;
}
model {
  theta ~ normal(0.7, 0.5);
}

How can I ensure that theta is in the range [0,1]?
Thanks!

Here’s what I’d recommend you try:

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real theta_log_odds;
  vector[2] kappa_link;
  vector[2] mu;
}
model {
  real ps[2];
  real theta = inv_logit(theta_log_odds) ;
  vector[2] kappa = exp(kappa_link); \\I think exp() is vectorized now

  kappa_link ~ normal(0,2);
  mu[1] ~ von_mises(0, 30) ; \\ check that this matches your knowledge
  mu[2] ~ von_mises(pi, 30) ; \\ check that this matches your knowledge
  theta_log_odds ~ normal(1,1) ; \\ check that this matches your knowledge
  
  for (n in 1:N) {
    if (kappa[1] < 100) {
      ps[1] = log(theta) + von_mises_lpdf(y[n] | mu[1], kappa[1]);
    }
    else  {
      ps[1] = log(theta) + normal_lpdf(y[n] | mu[1], sqrt(1/kappa[1]));
    }
    if (kappa[2] < 100) {
      ps[2] = log1m(theta) + von_mises_lpdf(y[n] | mu[2], kappa[2]);
    }
    else {
      ps[2] = log1m(theta) + normal_lpdf(y[n] | mu[2], sqrt(1/kappa[2]));
    }
    target += log_sum_exp(ps);
  }
}
2 Likes

Thanks Mike for your suggestion. It is a nice idea to use von_mises instead of normal to get the priors of mu. I have tried it with the complete script below, but I am still getting unexpected results:
kappa1: 6.418781e-02
kappa2: 1.153151e+12
mu1: -5.635822e-01
mu2: 3.680773e-09
theta: 3.057110e-01
Do you have any other ideas?

library(rstan)
rstan_options(auto_write = TRUE)

model_code <- "
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real theta_log_odds;
  vector[2] kappa_link;
  vector[2] mu;
}
model {
  real ps[2];
  real theta = inv_logit(theta_log_odds);
  vector[2] kappa = exp(kappa_link);

  kappa_link ~ normal(0,2);
  mu[1] ~ von_mises(0, 30);
  mu[2] ~ von_mises(pi(), 30); 
  theta_log_odds ~ normal(1,1); 
  
  for (n in 1:N) {
    if (kappa[1] < 100) {
      ps[1] = log(theta) + von_mises_lpdf(y[n] | mu[1], kappa[1]);
    }
    else  {
      ps[1] = log(theta) + normal_lpdf(y[n] | mu[1], sqrt(1/kappa[1]));
    }
    if (kappa[2] < 100) {
      ps[2] = log1m(theta) + von_mises_lpdf(y[n] | mu[2], kappa[2]);
    }
    else {
      ps[2] = log1m(theta) + normal_lpdf(y[n] | mu[2], sqrt(1/kappa[2]));
    }
    target += log_sum_exp(ps);
  }
}"


N <- 200
theta <- 0.7
y1 <- rep(0, N*theta)
y2 <- rep(pi, N*(1-theta))
Y <- c(y1, y2)

model <- stan_model(model_code = model_code, allow_undefined = TRUE,
                    verbose = TRUE)
m <- sampling(model, data = list(N=N, y=Y), chains = 1)

m_summary <- summary(m)
vals <- m_summary$summary[, 'mean'] 
kappa1 <- exp(vals['kappa_link[1]'])
kappa2 <- exp(vals['kappa_link[2]'])
mu1 <- vals['mu[1]']
mu2 <- vals['mu[2]']
theta <- 1/(1+exp(-vals['theta_log_odds']))
print(c(kappa1, kappa2, mu1, mu2, theta))

The results are unstable, sometimes it gives correct results. I am still getting the same warning messages. If you have any others ideas, I can give them a try.

Ah, shoot, I forgot that with Von Mises mixtures like this you have to take care to keep away from the area of the parameter space where one or both of the distributions goes uniform. Try some priors on kappa that are tighter and/or shifted to higher values

1 Like