Correlated posterior sample in logistic model with measurement error in probability scale

Dear Stan community,

I am toying with an example of logistic regression with measurement error and I require your assistance. Although the model fits well and returns the expected regression coefficients, I notice that some of the posterior samples are correlated. This is probably due to poor specification/ parametrization from my part. I would appretiate any tips on this regard. Below you may find the model specification code and a quick summary of the results.

A few key notes about the measurement error:
It represents an aereal exposure therefore the same effect can be shared among multiple individuals observed at a given location.
It’s logit transform is asumed to follow a normal distribution with known mean and sd specified with the data.
I am interested to asses it’s effect in the regression component at prevalence scale, therefore transforming through inv_logit.

There are 6 fixed effects and 1 measurement error included in this model. Though the estimates are in accordance with the results from a simplifed gam model. I observe there is high correlation in the posterior estimates between beta.2 (beta[2] fixed effect) and beta_x1 (measurement error). I can imagine this problem will amplify further with higher complexity models.

Thank you in advance for any suggestions.
CK.

data {
  
  // Dimensions
  int <lower = 1> M;
  int <lower = 1> N;
  int <lower = 1> N_s; // number of unique observations for measurement error
  
  // Measurement error mean and sd
  vector[N_s] mu_x1;
  vector[N_s] sigma_x1;
  
  // Outcome
  int <lower = 0, upper = 1> Y[N];
  
  // Design matrix
  matrix[N, M] Z;
  
  // location id
  int <lower = 1> loc[N];
  
}
parameters {
  
  vector[M] beta;
  real beta_x1;
  
  vector[N_s] std_eta_x1;
  
}

transformed parameters {
  
  vector[N_s] eta_x1;
  eta_x1 = mu_x1 + sigma_x1 .* std_eta_x1;
}

model {
  
  vector[N] x1;
  
  for (n in 1:N) {
    x1[n] = eta_x1[loc[n]];
  }
  
  // Regression coeficient priors
  beta ~ normal(0, 5);
  
  beta_x1 ~ normal(0, 5);
  
  // Measurement error priors
  std_eta_x1 ~ std_normal();
  
  // Likelihood
  Y ~ bernoulli_logit(Z*beta + beta_x1*inv_logit(to_vector(x1))); 

}

Maybe someone else here has a good analytical example, but in general I don’t believe there’s any reason to assume that posterior distributions for model parameters would be uncorrelated (even if you specify independent priors).

1 Like