I discussed this model with Erik and Ben about a year ago and they said “Why would you want to do this?” That may be your response after reading this, but bear with me.
I wrote the stan code below to solve the alpha parameters for logistic kernel ridge regression. The kernel K
is an P x P
matrix of similarities of N
different observations. The difference between this and typical kernel regression is that the similarity measure for each N
observation is a multivariate euclidean distance of features transformed to a similarity with an RBF. The point of which is to allow each observation to be represented by any number of features and a varying number of measurements (repeated across space, not time). This is generally classified as Distribution Regression, see this paper. I discuss a bit more of the specifics of my case in this Cross-Validated question
The point of this question is to figure out how to include the quantity of measurements, call it eta
, per each N
observations as a parameter in the model. The code below treats each element of the matrix K
with the same weight. What I hope to do is add a parameter that partially pools the alpha weights based on the magnitude of eta
. A very similar idea is covered in Bayesian Approaches to Distribution Regression and PDF -
Who Supported Obama in 2012?: Ecological Inference through Distribution Regression
Stan code:
data {
int<lower=0> N; # number of observations/groups
int<lower=0> P; # dimension of similarity matrix K
matrix[P, P] K; # similarity matrix of observations/groups
vector[N] y; # binary target/response variable (presence/absence)
real lambda; # ridge penalty (could be learned)
}
transformed data{
matrix[N, N] m;
matrix[N, N] ident_N;
ident_N = diag_matrix(rep_vector(1, rows(K)));
m = K + lambda * ident_N;
}
parameters {
vector[N] alpha_hat;
}
transformed parameters{
vector[N] q;
vector[N] e_;
vector[N] diagW;
vector[N] pi_;
vector[N] spec;
vector[N] Kalpha;
vector[N] alpha;
for (n in 1:N) {
alpha[n] = 1.0 / N;
}
Kalpha = K * alpha;
spec = 1 + exp(-Kalpha);
for (n in 1:N){
pi_[n] = 1.0 / spec[n];
}
for (n in 1:N){
diagW[n] = pi_[n] * (1.0 - pi_[n]);
}
for (n in 1:N){
e_[n] = (y[n] - pi_[n]) / diagW[n];
}
for (n in 1:N){
q[n] = Kalpha[n] + e_[n];
}
}
model {
alpha_hat ~ normal(0,10);
q ~ normal(m * alpha_hat, 0.01);
}
generated quantities{
vector[N] yhat1;
vector[N] yhat2;
yhat1 = (1.0 + exp(-(K * alpha_hat)));
for (n in 1:N){
yhat2[n] = 1.0 / yhat1[n];
}
}
R Code:
library("rstan")
K = matrix(c(0.343, 0.288, 0.235, 0.225, 0.314, 0.246, 0.240, 0.246, 0.269, 0.344,
0.288, 0.404, 0.234, 0.203, 0.341, 0.320, 0.316, 0.300, 0.333, 0.328,
0.235, 0.234, 0.225, 0.186, 0.233, 0.176, 0.182, 0.172, 0.179, 0.272,
0.225, 0.203, 0.186, 0.252, 0.260, 0.239, 0.236, 0.288, 0.261, 0.257,
0.314, 0.341, 0.233, 0.260, 0.466, 0.409, 0.390, 0.476, 0.460, 0.436,
0.246, 0.320, 0.176, 0.239, 0.409, 0.509, 0.489, 0.562, 0.544, 0.320,
0.240, 0.316, 0.182, 0.236, 0.390, 0.489, 0.486, 0.537, 0.518, 0.309,
0.246, 0.300, 0.172, 0.288, 0.476, 0.562, 0.537, 0.715, 0.625, 0.362,
0.269, 0.333, 0.179, 0.261, 0.460, 0.544, 0.518, 0.625, 0.599, 0.365,
0.344, 0.328, 0.272, 0.257, 0.436, 0.320, 0.309, 0.362, 0.365, 0.481),
nrow = 10)
y = c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0)
P = nrow(K)
N = length(y)
eta = c(5,10,3,26,12,7,6,12,3,8) # what to do with this?
model_dat <- list(K = K, y = y, P = P, N = N, lambda = 0.1)
fit_normal <- stan(file = "LMERR_stan.stan", data = model_dat)
If I am crazy, please set me straight. If you have a better idea for binary regression that allows for each observation to have multiple measurements spatially distributed around each ‘site’, please comment! If you have ideas for how to learn the ridge penalty lambda
, I am all ears! Any help is very much appreciated.
PS - this set-up looks a bit like a situation for multi-level model, but I do not believe that is the case because each observation/group is unique and I want to predict this on many groups that are not see in training. Then again, I could be misguided in that thinking!