Mixed distribution after Mardia and Sutton (1977) and Error (Rejecting initial value) in Code

Jennifer · June 22, 2017, 12:46pm

Hi!

I am trying to program up the model for cylindrical variables from the paper by Mardia and Sutton which can be found here mardia_sutton_cylindrical_dist.pdf

This is a combination of values on a cylinder, of which one is cyclic and described by a von Mises distribution and the other is described by a normal distribution.
My final goal is to cluster/classify my data points, I have programmed it up in a similar way for only Gaussians.

Unfortunately I can’t write formula in here, but basically I want to program the 4 equations which can be found in (1.1) (1.2) and (1.3).

I generate the 4 parameters, mu and kappa for the von Mises distribution and mu and sigma for the normal distribution in the parameters block.
In the transformed parameters block I apply the equations in (1.3) and in the model block I apply the equation (1.2) before I finally use the equation (1.1).

Now I get the error even though afterward the warmup and sampling starts:

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.

I don’t know where this comes from, maybe someone knows an answer?
Also do you see anything wrong with my approach to this code for the model?

Here is my code:

data {
    int         K;  // number classes
    int         N;  // number of all data points
    vector[N]    angle; //theta
    vector[N]    dist; //x
}

parameters{
  // variables for von mises distribution, mu and kappa
  vector [K] a_mu;
  vector<lower=0, upper=90>[K]  kappa;
  // variables for normal distribution, mu and sigma
  vector [K] d_mu;
  vector<lower=0, upper=6> [K] d_sigma;	

  simplex [K] weights;
}

transformed parameters{
  real p_1 = 0.5;
  real p_2 = 0.5;
  real p = sqrt(pow(p_1,2) + pow(p_2,2));

  vector[K] d_sigma_c;
  for (k in 1:K){
    d_sigma_c[k] = d_sigma[k] * (1 - pow(p,2));
  }

model{
  for (n in 1:N) {
    vector [K] pb;
    vector[K] d_mu_c;
    for (k in 1:K) {
 
      // calculate d_mu_c with the helper function
      d_mu_c[k] = d_mu[k] + d_sigma[k] * sqrt(kappa[k]) 
      * (p_1 *(cos(angle[n]) - cos(a_mu[k])) + p_2*(sin(angle[n]) - sin(a_mu[k])));
  
      //calculate real function
     pb[k] = log(weights[k]) + (1/(2 * pi() * bessel_first_kind(0,kappa[k])) 
  * exp(kappa[k] * (cos(angle[n] - a_mu[k]))) * 1/sqrt(2 * pi() * d_sigma_c[k])
  * exp(pow((dist[n] - d_mu_c[k]),2) / (2 * d_sigma_c[k])));
    }
    target += log_sum_exp (pb);
  }
}

bbbales2 · June 22, 2017, 3:42pm

An Initial Value Rejected error means Stan was never able to guess a set of initial conditions that produced a valid (non-infinity non-nan) likelihood. There’s probly a bug somewhere in this where there’s an index out of bounds or a parameter out of range of a function or something. I think there are some issues with Stan 2.15 not reporting all the errors it should (I think 2.16 will fix this).

Before anything else though, is there any reason you aren’t using the built in Von-Mises/normal distributions for this?

Jennifer · June 22, 2017, 4:03pm

Somehow I was convinced I could not use them because I calculate the mu and sigma differently beforehand, but I guess this was wrong…
So the error may be anywhere? In the model and in the parameters block?

bbbales2 · June 22, 2017, 4:28pm

I just took your model and ran it with this R file (there was a missing ‘}’ before the model block it needed before it compiled):

library(rstan)

K = 10
N = 10
angle = runif(N) * 3.0
dist = runif(N)

stan("~/Downloads/model.stan", data = list(K = K, N = N, angle = angle, dist = dist), cores = 1)

It compiled and initialized and ran. There were a mere 1012 divergent transitions after warmup but it got initialized :D (if you get divergent transitions when you’re running, check out http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup). Any chance the values you’re passing in are not valid? K > 0, N > 0?

Jennifer · June 23, 2017, 9:10am

Oh I guess I missed this bracket when I copied the code. I use cmdStan but I actually tried and generated my data as you did and it could be initialized completely fine without any warning, so I guess my input data was not valid somehow.
But I think the model itself does not work, the initial values stayed the same through the whole sampling process…

Also, if I use the built in von Mises distribution and the normal distribution like this

pb[k] = von_mises_lpdf(angle[n] | a_mu[k], kappa[k]) * normal_lpdf(dist[n] | d_mu_c[k] , d_sigma_c[k]);

I get another error which I haven’t got before:

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception thrown at line 46: normal_lpdf: Location parameter is -nan, but must be finite!

bbbales2 · June 23, 2017, 2:02pm

Exception thrown at line 46: normal_lpdf: Location parameter is -nan, but must be finite!

This is good (and by that I mean useful :D)! It’s saying that one of your d_mu_c[k] variables is a NaN. Maybe there was a divide by zero? Or a square root of a negative number? Check your data, and then just look through your code for these mistakes.

The other source of NaNs are uninitialized arrays. If not all elements of d_mu_c got written to, some of them my be NaNs (they default to NaNs I believe).

But definitely check your input data. Maybe there is a NaN in it. I think since we’ve seen the model run with made up data, it’s less likely a coding thing and more likely an input thing.

betanalpha · June 23, 2017, 3:00pm

The Stan PDFs do substantial validation of the input values, hence they will flag bad inputs that a custom implementation would not.

Jennifer · June 26, 2017, 10:12am

I have tried again to run my model with your generated data in R with the built in von Mises/normal distributions and get more errors than before?

First I get

Rejecting initial value:
Error evaluating the log probability at the initial value.

so this is not working at all

I have 8 Exceptions with my normal_lpdf, one time that the location parameter is -inf, and 7 times this but with different numbers:

Exception thrown at line 65: normal_lpdf: Scale parameter is -0.0384378, but must be > 0!

which depends on whether this equation is positive or negative

d_mu_c[k] = d_mu[k] + d_sigma[k] * sqrt(kappa[k]) 
  * (rho_1 *(cos(angle[n]) - cos(a_mu[k])) + rho_2*(sin(angle[n]) - sin(a_mu[k])));

I am pretty much lost right now

Also:
I tried using pairs(), but I get the error message

Error in plot.new() : figure margins too large

Anybody had this issue before?

bbbales2 · June 26, 2017, 3:25pm

Could you post the updated model with the built in distributions?

Exception thrown at line 65: normal_lpdf: Scale parameter is -0.0384378, but must be > 0!

This just means that the standard deviation you supply to the normal distribution must be positive, which it should be for things to make sense (this is what sigma is an element of R+ means in the manual).

Jennifer · June 27, 2017, 9:31am

I get different error messages each time I run my model, which I guess depends on what random parameters and values are generated each run.

I use this script to generate data an start my model:

#!/usr/bin/env Rscript
library(rstan)

K = 4
N = 10
angle = runif(N) * 3.0
dist = runif(N)

uni <- stan("/home/stan/uni_mises.stan", data = list(K = K, N = N, angle = angle, dist = dist), iter = 1000, cores = 1)

pairs(uni)

I always get many divergent transitions, and increasing adapt_delta to 0.9999 does not help completely.
Additionally, I sometimes get

There were 2 chains where the estimated Bayesian Fraction of Missing Information was low.

And I sometimes get:

Exception initializing step size.
Posterior is improper. Please check your model.

or

the following parameters were dropped because they are constant
d_sigma[4] d_sigma_c[1] d_sigma_c[2] d_sigma_c[3] d_sigma_c[4] rho

I don’t know where to start, because the error messages change every time. I tried using a seed (234), with this I ‘only’ hat divergent transitions and the low Bayesian Fraction of Missing Information.

Updated model:

data {
  int         K;  // number classes
  int         N;  // number of all data points
  vector[N]    angle; //theta
  vector[N]    dist; //x
}

parameters{
  // variables for von mises distribution, mu and kappa
  vector<lower=0, upper=6.28318530718> [K] a_mu;
  vector<lower=0, upper=8>[K]  kappa;
  // variables for normal distribution, mu and sigma
  vector<lower=0, upper=18> [K] d_mu;
  vector<lower=0, upper=6> [K] d_sigma;	

  real<lower=0, upper=1> rho_1;
  real<lower=0, upper=1-rho_1> rho_2;
}

transformed parameters{
  real rho = sqrt(pow(rho_1,2) + pow(rho_2,2)); //eq 1.3
  vector[K] d_sigma_c;
  for (k in 1:K){
    d_sigma_c[k] =  pow(d_sigma[k],2) * (1 - pow(rho,2));
  }
}

model{
  for (n in 1:N) {
    vector [K] pb;
    vector[K] d_mu_c;
    for (k in 1:K) {
        d_mu_c[k] = d_mu[k] + d_sigma[k] * sqrt(kappa[k]) 
      * (rho_1 *(cos(angle[n]) - cos(a_mu[k])) + rho_2*(sin(angle[n]) - sin(a_mu[k]))); 

      pb[k] = von_mises_lpdf(angle[n] | a_mu[k], kappa[k]) * normal_lpdf(dist[n] | d_mu_c[k] , d_sigma_c[k]);

    }
   target += log_sum_exp(pb);
  }
}

bbbales2 · June 27, 2017, 1:41pm

Here:

real<lower=0, upper=1> rho_1;
real<lower=0, upper=1-rho_1> rho_2;

You probably want to make rho into a length 2 simplex. I’m not sure this is valid syntax.

And here:

pb[k] = von_mises_lpdf(angle[n] | a_mu[k], kappa[k]) * normal_lpdf(dist[n] | d_mu_c[k] , d_sigma_c[k]);

The _lpdf functions return log probabilities, so you’ll want to add these, not multiply (I’m assuming this is a mixture model). Also I think there’s a simplex missing (check the manual finite mixture model and you’ll see what I mean – it’s the theta variable).

Don’t use the code I was using to generate data. I was just using that to see if it was possible to initialize the model. Generate test data according to your model (you can do that with generated quantities blocks in Stan or however you please).

I’d recommend taking a step back from the mixture model. Just fit some data that has an angular component and a normal component using the default Stan parameterizations. Once you get that working, look at alternate parameterizations or try the mixture model. If you wanna make a big model that works, you gotta start with a small model that works :D. It’s really hard to get it all right from the beginning.

Jennifer · June 29, 2017, 2:29pm

It actually works right now, thank you so much for your help and time!

Topic		Replies	Views
Join two cluster in a mixed distribution Modeling	3	679	July 22, 2017
Initial value problem Modeling	2	1776	January 27, 2018
Rejection Initial value Modeling	5	2046	July 3, 2017
Uniform circular distribution Modeling	3	868	September 3, 2019
Errors thrown with von mises distribution Modeling fitting-issues	13	1047	March 23, 2024

Mixed distribution after Mardia and Sutton (1977) and Error (Rejecting initial value) in Code

Related topics