# Mixed distribution after Mardia and Sutton (1977) and Error (Rejecting initial value) in Code

Hi!

I am trying to program up the model for cylindrical variables from the paper by Mardia and Sutton which can be found here mardia_sutton_cylindrical_dist.pdf

This is a combination of values on a cylinder, of which one is cyclic and described by a von Mises distribution and the other is described by a normal distribution.
My final goal is to cluster/classify my data points, I have programmed it up in a similar way for only Gaussians.

Unfortunately I can’t write formula in here, but basically I want to program the 4 equations which can be found in (1.1) (1.2) and (1.3).

I generate the 4 parameters, mu and kappa for the von Mises distribution and mu and sigma for the normal distribution in the parameters block.
In the transformed parameters block I apply the equations in (1.3) and in the model block I apply the equation (1.2) before I finally use the equation (1.1).

Now I get the error even though afterward the warmup and sampling starts:

``````Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value.
``````

I don’t know where this comes from, maybe someone knows an answer?
Also do you see anything wrong with my approach to this code for the model?

Here is my code:

``````data {
int         K;  // number classes
int         N;  // number of all data points
vector[N]    angle; //theta
vector[N]    dist; //x
}

parameters{
// variables for von mises distribution, mu and kappa
vector [K] a_mu;
vector<lower=0, upper=90>[K]  kappa;
// variables for normal distribution, mu and sigma
vector [K] d_mu;
vector<lower=0, upper=6> [K] d_sigma;

simplex [K] weights;
}

transformed parameters{
real p_1 = 0.5;
real p_2 = 0.5;
real p = sqrt(pow(p_1,2) + pow(p_2,2));

vector[K] d_sigma_c;
for (k in 1:K){
d_sigma_c[k] = d_sigma[k] * (1 - pow(p,2));
}

model{
for (n in 1:N) {
vector [K] pb;
vector[K] d_mu_c;
for (k in 1:K) {

// calculate d_mu_c with the helper function
d_mu_c[k] = d_mu[k] + d_sigma[k] * sqrt(kappa[k])
* (p_1 *(cos(angle[n]) - cos(a_mu[k])) + p_2*(sin(angle[n]) - sin(a_mu[k])));

//calculate real function
pb[k] = log(weights[k]) + (1/(2 * pi() * bessel_first_kind(0,kappa[k]))
* exp(kappa[k] * (cos(angle[n] - a_mu[k]))) * 1/sqrt(2 * pi() * d_sigma_c[k])
* exp(pow((dist[n] - d_mu_c[k]),2) / (2 * d_sigma_c[k])));
}
target += log_sum_exp (pb);
}
}``````

An Initial Value Rejected error means Stan was never able to guess a set of initial conditions that produced a valid (non-infinity non-nan) likelihood. There’s probly a bug somewhere in this where there’s an index out of bounds or a parameter out of range of a function or something. I think there are some issues with Stan 2.15 not reporting all the errors it should (I think 2.16 will fix this).

Before anything else though, is there any reason you aren’t using the built in Von-Mises/normal distributions for this?

Somehow I was convinced I could not use them because I calculate the mu and sigma differently beforehand, but I guess this was wrong…
So the error may be anywhere? In the model and in the parameters block?

I just took your model and ran it with this R file (there was a missing ‘}’ before the model block it needed before it compiled):

``````library(rstan)

K = 10
N = 10
angle = runif(N) * 3.0
dist = runif(N)

stan("~/Downloads/model.stan", data = list(K = K, N = N, angle = angle, dist = dist), cores = 1)
``````

It compiled and initialized and ran. There were a mere 1012 divergent transitions after warmup but it got initialized :D (if you get divergent transitions when you’re running, check out http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup). Any chance the values you’re passing in are not valid? K > 0, N > 0?

Oh I guess I missed this bracket when I copied the code. I use cmdStan but I actually tried and generated my data as you did and it could be initialized completely fine without any warning, so I guess my input data was not valid somehow.
But I think the model itself does not work, the initial values stayed the same through the whole sampling process…

Also, if I use the built in von Mises distribution and the normal distribution like this

``````pb[k] = von_mises_lpdf(angle[n] | a_mu[k], kappa[k]) * normal_lpdf(dist[n] | d_mu_c[k] , d_sigma_c[k]);
``````

I get another error which I haven’t got before:

``````Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception thrown at line 46: normal_lpdf: Location parameter is -nan, but must be finite!``````

Exception thrown at line 46: normal_lpdf: Location parameter is -nan, but must be finite!

This is good (and by that I mean useful :D)! It’s saying that one of your `d_mu_c[k]` variables is a NaN. Maybe there was a divide by zero? Or a square root of a negative number? Check your data, and then just look through your code for these mistakes.

The other source of NaNs are uninitialized arrays. If not all elements of d_mu_c got written to, some of them my be NaNs (they default to NaNs I believe).

But definitely check your input data. Maybe there is a NaN in it. I think since we’ve seen the model run with made up data, it’s less likely a coding thing and more likely an input thing.

The Stan PDFs do substantial validation of the input values, hence they will flag bad inputs that a custom implementation would not.

I have tried again to run my model with your generated data in R with the built in von Mises/normal distributions and get more errors than before?

First I get

``````Rejecting initial value:
Error evaluating the log probability at the initial value.
``````

so this is not working at all

I have 8 Exceptions with my normal_lpdf, one time that the location parameter is -inf, and 7 times this but with different numbers:

``````Exception thrown at line 65: normal_lpdf: Scale parameter is -0.0384378, but must be > 0!
``````

which depends on whether this equation is positive or negative

``````d_mu_c[k] = d_mu[k] + d_sigma[k] * sqrt(kappa[k])
* (rho_1 *(cos(angle[n]) - cos(a_mu[k])) + rho_2*(sin(angle[n]) - sin(a_mu[k])));
``````

I am pretty much lost right now

Also:
I tried using pairs(), but I get the error message

``````Error in plot.new() : figure margins too large
``````

Could you post the updated model with the built in distributions?

Exception thrown at line 65: normal_lpdf: Scale parameter is -0.0384378, but must be > 0!

This just means that the standard deviation you supply to the normal distribution must be positive, which it should be for things to make sense (this is what sigma is an element of R+ means in the manual).

I get different error messages each time I run my model, which I guess depends on what random parameters and values are generated each run.

I use this script to generate data an start my model:

``````#!/usr/bin/env Rscript
library(rstan)

K = 4
N = 10
angle = runif(N) * 3.0
dist = runif(N)

uni <- stan("/home/stan/uni_mises.stan", data = list(K = K, N = N, angle = angle, dist = dist), iter = 1000, cores = 1)

pairs(uni)
``````

I always get many divergent transitions, and increasing adapt_delta to 0.9999 does not help completely.

``````There were 2 chains where the estimated Bayesian Fraction of Missing Information was low.
``````

And I sometimes get:

``````Exception initializing step size.
``````

or

``````the following parameters were dropped because they are constant
d_sigma[4] d_sigma_c[1] d_sigma_c[2] d_sigma_c[3] d_sigma_c[4] rho
``````

I don’t know where to start, because the error messages change every time. I tried using a seed (234), with this I ‘only’ hat divergent transitions and the low Bayesian Fraction of Missing Information.

Updated model:

``````data {
int         K;  // number classes
int         N;  // number of all data points
vector[N]    angle; //theta
vector[N]    dist; //x
}

parameters{
// variables for von mises distribution, mu and kappa
vector<lower=0, upper=6.28318530718> [K] a_mu;
vector<lower=0, upper=8>[K]  kappa;
// variables for normal distribution, mu and sigma
vector<lower=0, upper=18> [K] d_mu;
vector<lower=0, upper=6> [K] d_sigma;

real<lower=0, upper=1> rho_1;
real<lower=0, upper=1-rho_1> rho_2;
}

transformed parameters{
real rho = sqrt(pow(rho_1,2) + pow(rho_2,2)); //eq 1.3
vector[K] d_sigma_c;
for (k in 1:K){
d_sigma_c[k] =  pow(d_sigma[k],2) * (1 - pow(rho,2));
}
}

model{
for (n in 1:N) {
vector [K] pb;
vector[K] d_mu_c;
for (k in 1:K) {
d_mu_c[k] = d_mu[k] + d_sigma[k] * sqrt(kappa[k])
* (rho_1 *(cos(angle[n]) - cos(a_mu[k])) + rho_2*(sin(angle[n]) - sin(a_mu[k])));

pb[k] = von_mises_lpdf(angle[n] | a_mu[k], kappa[k]) * normal_lpdf(dist[n] | d_mu_c[k] , d_sigma_c[k]);

}
target += log_sum_exp(pb);
}
}``````

Here:

``````real<lower=0, upper=1> rho_1;
real<lower=0, upper=1-rho_1> rho_2;
``````

You probably want to make rho into a length 2 simplex. I’m not sure this is valid syntax.

And here:

``````pb[k] = von_mises_lpdf(angle[n] | a_mu[k], kappa[k]) * normal_lpdf(dist[n] | d_mu_c[k] , d_sigma_c[k]);
``````

The _lpdf functions return log probabilities, so you’ll want to add these, not multiply (I’m assuming this is a mixture model). Also I think there’s a simplex missing (check the manual finite mixture model and you’ll see what I mean – it’s the theta variable).

Don’t use the code I was using to generate data. I was just using that to see if it was possible to initialize the model. Generate test data according to your model (you can do that with generated quantities blocks in Stan or however you please).

I’d recommend taking a step back from the mixture model. Just fit some data that has an angular component and a normal component using the default Stan parameterizations. Once you get that working, look at alternate parameterizations or try the mixture model. If you wanna make a big model that works, you gotta start with a small model that works :D. It’s really hard to get it all right from the beginning.

It actually works right now, thank you so much for your help and time!