Hi I am working on a clustering problem, where one of the components in my model is a euclidean distance matrix.

In its simlest form, the STAN code chunk looks like this,

```
data {
int<lower=0> N; // Dimension of the data object
int d; // Dimension of Latent space
int Y[N,N]; // Input Sociomatrix
real<lower=0> z_prior_sd;
}
transformed data{
vector[d] zeros;
vector[d] ones;
vector[N] zeros2;
vector[N] ones2;
real gam=-1;
ones = rep_vector(1, d);
zeros = rep_vector(0, d);
ones2 = rep_vector(1, N);
zeros2 = rep_vector(0, N);
}
parameters {
real alpha;
vector[d] z[N];
}
model {
for(i in 1:N)
{
to_vector(z[i]) ~ normal( 0 , z_prior_sd);
}
for(i in 1:N)
{
for(j in 1:N)
{
Y[i,j] ~ bernoulli(Phi(alpha+gam*distance(to_vector(z[i]),to_vector(z[j]))));
}
}
alpha ~ normal(0,10);
}
```

When I run the sampler, it keeps on throwing this error of failed to initialize, gradient evaluated at initial value is not finite and stuff like that.

Now the code above is a very watered down version of the full model, and I have been able to idenitfy that the issue lies with the ` distance(z[i],z[j])`

component. I am not exactly sure what the issue is and I was wondering if someone could help me with if there is something fundamentally wrong with how I am defining this component or is there something else?

The idea here is, some actors lying in a space can be attributed as neighbors based on their positions in an unobserved latent space (as represented by z_i). Here d=2 and the latent space is euclidean.

Any insight is appreciated. Thanks in advance.

Just a quick note that when you have arrays of vectors (i.e., `vector[d] z[N]`

), you don’t need to call `to_vector`

when you index them, as they already return a vector. For example:

```
Phi(alpha + gam * distance(z[i], z[j])))
```

As for the initialisation/gradient errors, it’s a bit hard to debug without data, but there’s a good chance that the issue is coming from the call to `Phi`

. The `Phi`

function is computationally a bit unstable, and will underflow with values smaller than -38, and overflow with values greater than 8. You can try using the `Phi_approx`

function instead which is an approximation which is a bit more robust.

Additionally, a good way to debug initialisation issues is to `print`

the input values:

```
for(i in 1:N) {
for(j in 1:N) {
real p = alpha + gam * distance(z[i], z[j]);
print(p);
real Phi_p = Phi(p);
print(Phi_p);
Y[i,j] ~ bernoulli(Phi_p);
}
}
```

This will give you an idea of what values are being passed to the functions here, and whether they’re what you’d expect

Thanks @andrjohns for the suggestions. I did print the values of every subsequent parameters just to check if they are what I would expect. They seem okay. Also, I tried the ` Phi_approx`

function call but the same issue persists.

I have attached the pseudo data that I am working with.

Sociomatrix.csv (8.8 KB)

Here `N=67, d=2,z_prior_sd=25`

.

This is the error message that I get

```
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can't start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"
```

I am just trying to understand if this error is at all a result of some faulty line of codes or something fundamentally wrong with the way this model has been defined.

Can you try with `init = 0`

? Sometimes the initial values can fall outside the range that `Phi`

/`Phi_approx`

is defined to work in, which causes the failure

Yeah that is what I did and it seems to have solved the issue. I also changed the model likelihood to `bernoulli_logit`

and used `init=0`

in the sampler call statement.

Thanks for your help with this.