Logistic regression with measurement errors / latent variable - low E-BFMI

rlain · December 10, 2019, 5:04pm

Hello,

I am trying to model logistic regression with a latent variable (or equivalently to perform the regression with measurement errors). This post (Non-centered parameterization of measurement error model) had a similar problem which was solved by only constraining the sd to positive real axis. I have been having issues with low E-BFMI and some scale issues.

More specifically, I am working with two separate logistic regression models shown in abridged form below. In the model we have unobserved noise Z and observed features of individuals X affecting a decision T. If decision is positive (T=1) then we can observe a binary outcome Y, otherwise not. The M decision-makers have different baseline probabilities for positive decisions modelled with the intercepts \alpha_t. Eventually we are interested in predicting the value for Y for those who were given a negative decision.

In the model, \beta coefficients have t_6 priors as per your recommendations and the intercepts have been given priors. The coefficient priors have all been non-centralized using the gamma-mixture representation shown in the manual.

\begin{align} Z &\sim N(0, 1) \\ T &\sim \text{Bernoulli}(\sigma(\alpha_t + \beta_{xt}x + \beta_{zt}z)), ~ \text{ where }~ t = 1, \ldots, M\\ Y ~ | ~T=1 &\sim \text{Bernoulli}(\sigma(\alpha_y + \beta_{xy}x + \beta_{zy}z)) \end{align}

The issue with this model is that even when the data has been generated exactly like the model, I have been getting warnings for low E-BFMI (and occasionally the coefficients \beta_{xy} and \beta_{zy} have been too large, but similar). The E-BFMI is usually between 0.2 and 0.3, sometimes above and sometimes below. According to some discussion (if I remember correctly it was Mr. Carpenter who said) that is the region where one should be concerned. There hasn’t been issues with divergences as in some examples show in the forum. I have run the model with adapt_delta at 0.9 and then there were some divergences, but those seemed false positives. I have read Martin’s blog post Identifying Non-identifiability but I think my model is a bit different because here the response is binary.

I got the feeling from Martin’s blog post that the issue might be related to having only largish values of X observed for the outcome Y. Those values are then in the tail of the sigmoid giving only little information about the parameters.

I would be very thankful for any advice. Please do ask, if my explanations were not sufficient.

data {
  int<lower=1> D; // Dimensions of the features and coefficient vectors
  int<lower=0> N_obs;  // Number of observed observations (with T = 1)
  int<lower=0> N_cens; // Number of censored observations (with T = 0)
  int<lower=0> M; // Number of judges
  real<lower=0> sigma_tau; // Prior for the variance parameters.
  int<lower=1, upper=M> jj_obs[N_obs];   // judge_ID array
  int<lower=1, upper=M> jj_cens[N_cens]; // judge_ID array
  int<lower=0, upper=1> dec_obs[N_obs];   // Decisions for the observed observations
  int<lower=0, upper=1> dec_cens[N_cens]; // Decisions for the censored observations
  row_vector[D] X_obs[N_obs];   // Features of the observed observations 
  row_vector[D] X_cens[N_cens]; // Features of the censored observations 
  int<lower=0, upper=1> y_obs[N_obs]; // Outcomes of the observed observations
}

parameters {
  // Latent variable
  vector[N_obs] Z_obs;
  vector[N_cens] Z_cens;
  
  // Intercepts and their variance parameters.
  real<lower=0> sigma_T;
  real<lower=0> sigma_Y;

  vector[M] alpha_T_raw;
  real alpha_Y_raw;

  // Temporary variables to compute the coefficients
  vector[D] a_XT;
  vector[D] a_XY;
  real<lower=0> a_ZT; // Presume latent variable has a positive coefficient.
  real<lower=0> a_ZY;

  // Shared precisions for the coefficients
  real<lower=0> tau_X;
  real<lower=0> tau_Z;

}

transformed parameters {
  
  // Coefficients
  vector[D] beta_XT;
  vector[D] beta_XY;
  real<lower=0> beta_ZT; // Presume latent variable has a positive coefficient.
  real<lower=0> beta_ZY;
  
  // Intercepts
  vector[M] alpha_T;
  real alpha_Y;

  beta_XT = a_XT / sqrt(tau_X);
  beta_XY = a_XY / sqrt(tau_X);
  beta_ZT = a_ZT / sqrt(tau_Z);
  beta_ZY = a_ZY / sqrt(tau_Z);
  
  alpha_T = sigma_T * alpha_T_raw;
  alpha_Y = sigma_Y * alpha_Y_raw;
}

model {
  // Latent variable
  Z_obs ~ normal(0, 1);
  Z_cens ~ normal(0, 1);
  
  // Intercept priors ...
  
  // Gamma-mixture representation of the coefs
  a_XT ~ normal(0, 1);
  a_XY ~ normal(0, 1);
  a_ZT ~ normal(0, 1);
  a_ZY ~ normal(0, 1);

  // nu = 6 -> nu/2 = 3
  tau_X ~ gamma(3, 3);
  tau_Z ~ gamma(3, 3);

  // Compute the regressions for the observed observations
  for(i in 1:N_obs){
    dec_obs[i] ~ bernoulli_logit(alpha_T[jj_obs[i]] + X_obs[i] * beta_XT + beta_ZT * Z_obs[i]);
    y_obs[i] ~ bernoulli_logit(alpha_Y + X_obs[i] * beta_XY + beta_ZY * Z_obs[i]);
  }

  // Compute the regression for the censored observations
  for(i in 1:N_cens)
    dec_cens[i] ~ bernoulli_logit(alpha_T[jj_cens[i]] + X_cens[i] * beta_XT + beta_ZT * Z_cens[i]);
}

bbbales2 · December 10, 2019, 11:41pm

If you’re going for this:

alpha_T ~ normal(0, sigma_T);
alpha_Y ~ normal(0, sigma_Y);

You’ll need to add some normal(0, 1) priors on these.

This corresponds to:

vector[M] alpha_T_raw;
real alpha_Y_raw;

in parameters,

alpha_T = sigma_T * alpha_T_raw;
alpha_Y = sigma_Y * alpha_Y_raw;

in transformed parameters, and

alpha_T_raw ~ normal(0, 1);
alpha_Y_raw ~ normal(0, 1);

in the model block. (check the non-centered parameterization stuff here: Stan User’s Guide and here: Diagnosing Biased Inference with Divergences)

Also if you don’t have any reason to do otherwise, probly roll with normal priors. You can get E-BFMI warnings with heavy-tailed distributions.

Edit: Also welcome to the forums!

rlain · December 11, 2019, 10:44am

Hi bbbales2 and thanks for your reply!

Actually I already have normal(0, 1) priors for the parameters as you suggested. The “raw” parameters have normal(0, sigma) priors and the sigmas have normal(0, 1) priors.

I’ll try this. I assume you mean similar normal-normal structure as for the intercepts. I have to dig into some calculus to compute the posterior predictive for those to do sensitivity analysis. :)

Thanks again for your help!

bbbales2 · December 11, 2019, 1:50pm

The raw ones should be normal(0, 1).

Shouldn’t need any calculus. We’re not doing any conjugacy stuff here. You should be able to just take the samples and run them through your generative model.

Topic		Replies	Views
Low E-BFMI for Bayesian correlation model Modeling fitting-issues	0	224	August 23, 2023
Fitting issues for model with measurement error Modeling	4	57	April 10, 2025
Logistic regression when the dependent variable is measured with error Modeling brms	4	636	March 23, 2023
Correlated posterior sample in logistic model with measurement error in probability scale Modeling techniques , logistic-regression , reparametrization	1	731	January 30, 2022
Brms prior speficiation in negative binomial model Modeling prior-choice , brms	2	800	November 30, 2022

Logistic regression with measurement errors / latent variable - low E-BFMI

Related topics