Simple question: logistic regression with group level covariate and non-centered parameterization

Hi everyone!

New to Stan and Bayes with a little experience with JAGS. I’m working in preliminary version of a mode I will extend and make more complex as I understand more about the functionality. On the way I had several warnings that I could resolved following the recommendations in the manual. Among them, the non-centered parameterization. I have just 5 groups and a0 is the reference level for the other 2 fixed covariates. Simple question here

How do I add a group level covariate? I tried some alternatives I thought plausible but the results are plain different compare to the model I set up in Jags.

data {
  int N;
  int M; 
  int index_sites [N];
  int wean_mate [N]; //predictor
  int pregnancy [N]; //predictor
  int CoV_eid[N]; //outcome 
}                                                

parameters {
  real a0; 
  real b_wean_mate;
  real b_pregnancy;
  real<lower=0,upper=10> sigma_theta; 
  vector[M] epsilon;
  
}

transformed parameters {
  vector[M] theta_site;
  theta_site = sigma_theta * epsilon; // equivalent to normal(0,sigma_theta)
  
}

model {
  vector[N] eta;
  a0 ~ normal(0,80);         // standard deviation
  b_pregnancy ~ normal(0,80);
  b_wean_mate ~ normal(0,80);
  
  sigma_theta ~ normal(0,3); //cauchy(0,5) works also
  epsilon~normal(0,1);
  
  for(i in 1:N)
    eta[i] = a0 + b_wean_mate * wean_mate[i] + b_pregnancy * pregnancy[i] + theta_site[index_sites[i]];
  CoV_eid ~ bernoulli_logit(eta); //model
  
}

something in the transformed parameter block such as


data {
  real  data_group_covariate[M];

parameters {
 ...
 real b_group_covariate; 
}


transformed parameters {
  vector[M] theta_site;
  theta_site = b_group_covariate*data_group_covariate + sigma_theta * epsilon; // equivalent to normal(0,sigma_theta)
  
}

model {
 ...
 b_group_covariate~normal(0,80)
}

Thanks for the feedback.

Assuming you have M groups and N data points and that index_sites is the group label for each data point, then I’d assume you’d want to add to eta[i] a term that looked like b_group_covariate * data_group_covariate[index_site[i]].

I think it’s usually easier to write out models in their centered parameterization and then convert. You’re right that adding b_group_covariate * data_group_covariate does something strange to the implied prior on the theta_site. It’s easiest to write everything out and then see what can be non-centered (or what even needs to be non-centered) rather than trying to write it from the ground up.

Hi Ben:

Thank you very much for the response. I tried by setting up the mu in a following step in the model block and now the sampler does work and I get results that are decently similar to the ones I get from JAGS; but not exactly to be 100% relax about the model parameterization.

I’m posting this code. I do think is right, but definetely a check would be awesome. Thank you in advance

data {
int N;
int M;
int index_sites [N];
int wean_mate [N];
int pregnancy [N];
int data_group_covariate[M]
int CoV_eid[N];
}

parameters {
real a0;
real b_wean_mate;
real b_pregnancy;
real b_group_covariate;
real<lower=0, upper=10> sigma_theta;
vector[M] epsilon;
real mu;

}

transformed parameters {
vector[M] theta_site;
theta_site = mu + sigma_theta * epsilon;
}

model {
vector[N] eta;
a0 ~ normal(0,80);
b_wean_mate ~ normal(0,80);
b_pregnancy ~ normal(0,80);
b_group_covariate ~ normal(0,80);

sigma_theta ~ normal(0,3);
epsilon~normal(0,1);

for(j in 1:M)
mu~normal(b_group_covariate*data_group_covariate[j],3);

for(i in 1:N)
eta[i] = a0 + b_wean_mate * wean_mate[i] + b_pregnancy * pregnancy[i] + theta_site[index_sites[i]];
CoV_eid ~ bernoulli_logit(eta); //model
}

Oh no! Sorry for the delay. I missed this. Either way, this looks strange to me:

for(j in 1:M)
   mu ~ normal(b_group_covariate*data_group_covariate[j],3);

If you write that out it is:

mu ~ normal(b_group_covariate * data_group_covariate[1],3);
mu ~ normal(b_group_covariate * data_group_covariate[2],3);
...
mu ~ normal(b_group_covariate * data_group_covariate[M],3);

which is multiple priors on the variable which seems wrong.

Did you mean to declare mu to be a vector of length M and then do:

for(j in 1:M)
  mu[j] ~ normal(b_group_covariate*data_group_covariate[j],3);

perhaps?

That’s correct!

thanks Ben!

Does it match your previous fit better now? It’s probably worth doing a simulated data experiment to make sure this model is fitting the trends you want to detect to debug it. That make sense?

It does match now closely to the previous fit. I now what you mean with the simulated data. I think the way it is now makes me confident of the results.

1 Like