Fitting a mixture of two von Mises distribution

Satya_Arjunan · November 23, 2020, 6:06am

I would like to fit a mixture of two von Mises distribution to data points with two peaks at 0 and \pi. Below is the complete script that I have written for this purpose:

library(rstan)
rstan_options(auto_write = TRUE)

model_code <- "
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  simplex[2] theta;
  vector[2] kappa_link;
  vector[2] mu_link;
}
model {
  real ps[2];
  real kappa[2];
  real mu[2];
  mu_link ~ normal(0,5);
  kappa_link ~ normal(0,2);
  mu[1] = 2*atan(mu_link[1]);
  mu[2] = 2*atan(mu_link[2]);
  kappa[1] = exp(kappa_link[1]);
  kappa[2] = exp(kappa_link[2]);
  for (n in 1:N) {
    if (kappa[1] < 100) {
      ps[1] = log(theta[1]) + von_mises_lpdf(y[n] | mu[1], kappa[1]);
    }
    else  {
      ps[1] = log(theta[1]) + normal_lpdf(y[n] | mu[1], sqrt(1/kappa[1]));
    }
    if (kappa[2] < 100) {
      ps[2] = log(theta[2]) + von_mises_lpdf(y[n] | mu[2], kappa[2]);
    }
    else {
      ps[2] = log(theta[2]) + normal_lpdf(y[n] | mu[2], sqrt(1/kappa[2]));
    }
    target += log_sum_exp(ps);
  }
}"

N <- 200
weight <- 0.7
y1 <- rep(0, N*weight)
y2 <- rep(pi, N*(1-weight))
Y <- c(y1, y2)

model <- stan_model(model_code = model_code, allow_undefined = TRUE,
                    verbose = TRUE)
m <- sampling(model, data = list(N=N, y=Y), chains = 1)

m_summary <- summary(m)
vals <- m_summary$summary[, 'mean'] 
kappa1 <- exp(vals['kappa_link[1]'])
kappa2 <- exp(vals['kappa_link[2]'])
mu1 <- 2*atan(vals['mu_link[1]'])
mu2 <- 2*atan(vals['mu_link[2]'])
print(c(kappa1, kappa2, mu1, mu2))

At the end of the sampling run, I am getting the following warning messages:
1: There were 971 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 2: Examine the pairs() plot to diagnose sampling problems
3: The largest R-hat is 1.93, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.

I have looked at the provided links above and increased the max_treedepth to above 10 but I am not getting any improvements. I am unsure what are the steps that I can take to improve the model. I appreciate any help to move forward.

mike-lawrence · December 3, 2020, 8:11pm

Do you know for certain that the two distributions are centered on 0 and pi? If so, you should get rid of the mu parameter and use those values for the locations instead.

If you’re more expressing that you’re pretty confident the locations are 0 & pi, then keep mu but use a prior to convey that confidence. Right now your prior on the two locations implies that they can both be pretty-much anywhere in the parameter space, which I suspect is leading to label switching issues.

mike-lawrence · December 3, 2020, 8:55pm

Also, I suggest using a single variable for the probability parameter rather than a simplex; the former permits easier specification of priors which in turn could help your model if you indeed have much in the way of expectations for the ratio of the two classes.

Satya_Arjunan · December 3, 2020, 10:49pm

Thanks Mike for kindly looking into my problem. I know that the two peaks are likely to be near 0 and \pi. To keep \mu and convey that confidence, is this how I should do it using a prior:

  mu[1] = 2*atan(mu_link[1]);
  mu[2] = 2*atan(mu_link[2]) + pi();

For the probability parameter theta, say for example that I know that it is in the vicinity of 0.7, is this a good way to set the model with prior:

parameters {
  real theta;
}
model {
  theta ~ normal(0.7, 0.5);
}

How can I ensure that theta is in the range [0,1]?
Thanks!

mike-lawrence · December 3, 2020, 11:10pm

Here’s what I’d recommend you try:

data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real theta_log_odds;
  vector[2] kappa_link;
  vector[2] mu;
}
model {
  real ps[2];
  real theta = inv_logit(theta_log_odds) ;
  vector[2] kappa = exp(kappa_link); \\I think exp() is vectorized now

  kappa_link ~ normal(0,2);
  mu[1] ~ von_mises(0, 30) ; \\ check that this matches your knowledge
  mu[2] ~ von_mises(pi, 30) ; \\ check that this matches your knowledge
  theta_log_odds ~ normal(1,1) ; \\ check that this matches your knowledge
  
  for (n in 1:N) {
    if (kappa[1] < 100) {
      ps[1] = log(theta) + von_mises_lpdf(y[n] | mu[1], kappa[1]);
    }
    else  {
      ps[1] = log(theta) + normal_lpdf(y[n] | mu[1], sqrt(1/kappa[1]));
    }
    if (kappa[2] < 100) {
      ps[2] = log1m(theta) + von_mises_lpdf(y[n] | mu[2], kappa[2]);
    }
    else {
      ps[2] = log1m(theta) + normal_lpdf(y[n] | mu[2], sqrt(1/kappa[2]));
    }
    target += log_sum_exp(ps);
  }
}

Satya_Arjunan · December 4, 2020, 12:07am

Thanks Mike for your suggestion. It is a nice idea to use von_mises instead of normal to get the priors of mu. I have tried it with the complete script below, but I am still getting unexpected results:
kappa1: 6.418781e-02
kappa2: 1.153151e+12
mu1: -5.635822e-01
mu2: 3.680773e-09
theta: 3.057110e-01
Do you have any other ideas?

library(rstan)
rstan_options(auto_write = TRUE)

model_code <- "
data {
  int<lower=0> N;
  vector[N] y;
}
parameters {
  real theta_log_odds;
  vector[2] kappa_link;
  vector[2] mu;
}
model {
  real ps[2];
  real theta = inv_logit(theta_log_odds);
  vector[2] kappa = exp(kappa_link);

  kappa_link ~ normal(0,2);
  mu[1] ~ von_mises(0, 30);
  mu[2] ~ von_mises(pi(), 30); 
  theta_log_odds ~ normal(1,1); 
  
  for (n in 1:N) {
    if (kappa[1] < 100) {
      ps[1] = log(theta) + von_mises_lpdf(y[n] | mu[1], kappa[1]);
    }
    else  {
      ps[1] = log(theta) + normal_lpdf(y[n] | mu[1], sqrt(1/kappa[1]));
    }
    if (kappa[2] < 100) {
      ps[2] = log1m(theta) + von_mises_lpdf(y[n] | mu[2], kappa[2]);
    }
    else {
      ps[2] = log1m(theta) + normal_lpdf(y[n] | mu[2], sqrt(1/kappa[2]));
    }
    target += log_sum_exp(ps);
  }
}"


N <- 200
theta <- 0.7
y1 <- rep(0, N*theta)
y2 <- rep(pi, N*(1-theta))
Y <- c(y1, y2)

model <- stan_model(model_code = model_code, allow_undefined = TRUE,
                    verbose = TRUE)
m <- sampling(model, data = list(N=N, y=Y), chains = 1)

m_summary <- summary(m)
vals <- m_summary$summary[, 'mean'] 
kappa1 <- exp(vals['kappa_link[1]'])
kappa2 <- exp(vals['kappa_link[2]'])
mu1 <- vals['mu[1]']
mu2 <- vals['mu[2]']
theta <- 1/(1+exp(-vals['theta_log_odds']))
print(c(kappa1, kappa2, mu1, mu2, theta))

Satya_Arjunan · December 4, 2020, 12:20am

The results are unstable, sometimes it gives correct results. I am still getting the same warning messages. If you have any others ideas, I can give them a try.

mike-lawrence · December 4, 2020, 12:35pm

Ah, shoot, I forgot that with Von Mises mixtures like this you have to take care to keep away from the area of the parameter space where one or both of the distributions goes uniform. Try some priors on kappa that are tighter and/or shifted to higher values

Topic		Replies	Views
Fitting a Mixture of Von Mises and Uniform Distributions in brms brms	3	1193	June 30, 2019
Estimating parameters of two normal distribution, mixed at proportions Modeling	5	1426	May 10, 2019
Joint Probability with von Mises Modeling	1	752	June 8, 2017
Von_mises_rng with kappa=0 Modeling	2	584	June 28, 2018
Estimating parameters of a negative-binomial mixture model Modeling	35	2535	July 10, 2019

Fitting a mixture of two von Mises distribution

Related topics