I am working on a model which involves calculating the neighborhood density of each observation, and then using that neighborhood density (ND) as a predictor term.

The ND of *x_i* is calculated as follows:

- find the nearest neighbor of
*x_i*, it is at some distance*d* - count how many other observations are withing
*d***factor*, where*factor*is some real > 1

To calculate ND I use an NxM matrix, with N being the number of observations I’m modelling, and M the number of potential neighbors.

Now, so far I’ve been setting *factor* by trial and error, but I thought I’d give it a try building a model to estimate it.

The following is what I came up with:

```
functions{
real rank_of(real [] vec, real val) {
int Ns = size(vec);
real res = 0.0;
for (n in 1:Ns) {
if (vec[n] < val) {
res += 1;
} else {
break;
}
}
return res;
}
vector find_neighborhood(matrix NNS, real nn_limit) {
int Ns = rows(NNS);
int Ms = cols(NNS);
vector[Ns] nns;
for (n in 1:Ns) {
real row_n[Ms];
real min_val_fac;
row_n = to_array_1d(NNS[n]);
min_val_fac = row_n[1] * nn_limit;
nns[n] = rank_of(row_n, min_val_fac);
}
return(nns);
}
}
data {
int<lower=0> N;
int<lower=0> M; // matrix size
matrix[N, M] NNS; // matrix of neighbors, each row is sorted
int<lower=1> y[N]; // Outcomes
}
parameters {
real Intercept;
real beta_nn; // slope
real<lower=0,upper=1> nn_limit; // factor for the neighbors, should be within 1 and 2
}
transformed parameters {
real nn_limit2 = nn_limit+1;
}
model {
vector[N] nn_dense;
vector[N] nn_dense_s;
vector[N] mu;
nn_dense = find_neighborhood(NNS, nn_limit2);
nn_dense_s = (nn_dense-mean(nn_dense))/sd(nn_dense);
mu = rep_vector(Intercept, N) + beta_nn*nn_dense_s;
Intercept ~ normal(0, 5);
beta_nn ~ normal(0, 10);
nn_limit ~ beta(5,5);
y ~ poisson_log(mu);
}
generated quantities {
vector[N] nn_dense;
nn_dense = find_neighborhood(NNS, nn_limit);
}
```

N is 35 in my test case, and M some 3000.

The model samples well and does not throw any errors. However, by the nature of rank_of(), nn_dense ends up being round numbers, so I don’t know if this is an issue.

Since I haven’t seen models like this around, I’d be interesting in knowing whether someone has already come up with something more sophisticated/better for doing something similar.