I am working on a model which involves calculating the neighborhood density of each observation, and then using that neighborhood density (ND) as a predictor term.
The ND of x_i is calculated as follows:
- find the nearest neighbor of x_i, it is at some distance d
- count how many other observations are withing d * factor , where factor is some real > 1
To calculate ND I use an NxM matrix, with N being the number of observations I’m modelling, and M the number of potential neighbors.
Now, so far I’ve been setting factor by trial and error, but I thought I’d give it a try building a model to estimate it.
The following is what I came up with:
functions{
real rank_of(real [] vec, real val) {
int Ns = size(vec);
real res = 0.0;
for (n in 1:Ns) {
if (vec[n] < val) {
res += 1;
} else {
break;
}
}
return res;
}
vector find_neighborhood(matrix NNS, real nn_limit) {
int Ns = rows(NNS);
int Ms = cols(NNS);
vector[Ns] nns;
for (n in 1:Ns) {
real row_n[Ms];
real min_val_fac;
row_n = to_array_1d(NNS[n]);
min_val_fac = row_n[1] * nn_limit;
nns[n] = rank_of(row_n, min_val_fac);
}
return(nns);
}
}
data {
int<lower=0> N;
int<lower=0> M; // matrix size
matrix[N, M] NNS; // matrix of neighbors, each row is sorted
int<lower=1> y[N]; // Outcomes
}
parameters {
real Intercept;
real beta_nn; // slope
real<lower=0,upper=1> nn_limit; // factor for the neighbors, should be within 1 and 2
}
transformed parameters {
real nn_limit2 = nn_limit+1;
}
model {
vector[N] nn_dense;
vector[N] nn_dense_s;
vector[N] mu;
nn_dense = find_neighborhood(NNS, nn_limit2);
nn_dense_s = (nn_dense-mean(nn_dense))/sd(nn_dense);
mu = rep_vector(Intercept, N) + beta_nn*nn_dense_s;
Intercept ~ normal(0, 5);
beta_nn ~ normal(0, 10);
nn_limit ~ beta(5,5);
y ~ poisson_log(mu);
}
generated quantities {
vector[N] nn_dense;
nn_dense = find_neighborhood(NNS, nn_limit);
}
N is 35 in my test case, and M some 3000.
The model samples well and does not throw any errors. However, by the nature of rank_of(), nn_dense ends up being round numbers, so I don’t know if this is an issue.
Since I haven’t seen models like this around, I’d be interesting in knowing whether someone has already come up with something more sophisticated/better for doing something similar.