Estimating population contact: Single GP for multiple regressions

I have a dataset with populations and several binary features for each population. I also have latitude and longitude data for the populations.

Imagine the data looks as follows:

A B C lat long
1 0 0  12 0
1 1 0  11 0.5
0 0 1  15 5

I know that the presence or absence of some of these features are correlated. Most importantly, I know that there is some borrowing between populations of the presence or absence of features . I also have some additional controls. I am not interested in the correlations between A, B, C, etc. What I am interested in is estimating the amount of contact between the populations based on the presence or absence of all the features.

If I was only looking at one of the features, in BRMS I could do something like :

brm(bf(A  ~ 1 + B + C + gp(lat, long)),
         family = bernoulli,
         ...)

To estimate all the model combinations, I could do:

brm( bf(A  ~ 1 + B + C + gp(latitude, longitude)) + 
           bf(B  ~ 1 + A + C + gp(latitude, longitude)) +
           bf(C  ~ 1 + B + A + gp(latitude, longitude)),
         family = bernoulli,
         ...)

But as I understand it, this estimates independent GPs for each model. From what I’ve read, it is not possible using BRMS to share the same predictor for multiple models.
Looking at the STAN code, I think that changing:

...
vector[N_A] gp_pred_A_1 = gp(Xgp_A_1, sdgp_A_1[1], lscale_hasb_1[1], zgp_A_1);
vector[N_B] gp_pred_B_1 = gp(Xgp_B_1, sdgp_B_1[1], lscale_hasb_1[1], zgp_B_1);
vector[N_C] gp_pred_C_1 = gp(Xgp_C_1, sdgp_C_1[1], lscale_hasb_1[1], zgp_C_1);
vector[N_A] mu_hasb = Intercept_A + rep_vector(0.0, N_A) + gp_pred_A_1;
vector[N_B] mu_hasd = Intercept_B + rep_vector(0.0, N_B) + gp_pred_B_1;
vector[N_C] mu_hasf = Intercept_C + rep_vector(0.0, N_C) + gp_pred_C_1;
...

To:

...
vector[N_A] gp_pred_A_1 = gp(Xgp_A_1, sdgp_A_1[1], lscale_hasb_1[1], zgp_A_1);
vector[N_A] mu_hasb = Intercept_A + rep_vector(0.0, N_A) + gp_pred_A_1;
vector[N_B] mu_hasd = Intercept_B + rep_vector(0.0, N_B) + gp_pred_A_1;
vector[N_C] mu_hasf = Intercept_C + rep_vector(0.0, N_C) + gp_pred_A_1;
...

(Xgp_A_1, Xgp_B_1, etc. are the same for all three models)

will share the same GP across all three models. So, my first question would be: is this a valid model?

Would it maybe also make sense to fix the scale to be the same for all three models, but allow the sdgp’s to vary by model? does that make any sense? Intuitively I would like to also specify that a model which allows different GPs by model, but where they inform each other and the lscale and sdgp cannot vary freely.

The intuition I have is that while between population x1 and x2 there is only a fixed amount of contact, feature A may be more easily borrowed than feature B, for example, so a single GP might be too strong of a constraint.

Final bonus question: is it possible to specify the GP to only apply to the presence of the response variable? So to state that the presence of a feature is spatially correlated but its absence is not.

1 Like

Hey there!

I’m not a GP expert, but I thought I’d chime in anyways… As for your question if these kinds of models are valid or make sense, I’d say “It depends.”, which is almost always the correct answer, but is unfortunately not so helpful. With respect to your boldface question, I guess that would not do what you want. However, I think it is possible to do what you describe next: Fit the three seperate GPs with common hyperparameters (or just have one of them vary by outcome). It think your best bet would be to start with the latent variable GP chapter in the user guide, fit a (single) GP and then have seperate eta for the three different outcomes and then model on from there (varying marginal sd etc…).

Cheers,
Max

1 Like

Thank you Max, this makes a lot of sense. Regarding the final question, any idea on how to restrict a GP to only apply when Y_i = 1? To state: there is only a spatial effect for the presence of a feature but not its absence?

Well, you could try something like

if(Y[i] == 1){
  1 ~ bernoulli_logit(alpha + ... beta*x[i] + f[i]);
} else {
  0 ~ bernoulli_logit(alpha + ... beta*x[i]);
}

(where f is the latent GP) which would be quite inefficient, but it’s not hard to vectorize it. Although, I’m not sure if that really makes sense, since there’s probably no real data to inform the GP (no variation in the dependent variable). But maybe I also misunderstood.

Cheers,
Max

1 Like