Logistic regression with measurement errors / latent variable - low E-BFMI


I am trying to model logistic regression with a latent variable (or equivalently to perform the regression with measurement errors). This post (Non-centered parameterization of measurement error model) had a similar problem which was solved by only constraining the sd to positive real axis. I have been having issues with low E-BFMI and some scale issues.

More specifically, I am working with two separate logistic regression models shown in abridged form below. In the model we have unobserved noise Z and observed features of individuals X affecting a decision T. If decision is positive (T=1) then we can observe a binary outcome Y, otherwise not. The M decision-makers have different baseline probabilities for positive decisions modelled with the intercepts \alpha_t. Eventually we are interested in predicting the value for Y for those who were given a negative decision.

In the model, \beta coefficients have t_6 priors as per your recommendations and the intercepts have been given priors. The coefficient priors have all been non-centralized using the gamma-mixture representation shown in the manual.

\begin{align} Z &\sim N(0, 1) \\ T &\sim \text{Bernoulli}(\sigma(\alpha_t + \beta_{xt}x + \beta_{zt}z)), ~ \text{ where }~ t = 1, \ldots, M\\ Y ~ | ~T=1 &\sim \text{Bernoulli}(\sigma(\alpha_y + \beta_{xy}x + \beta_{zy}z)) \end{align}

The issue with this model is that even when the data has been generated exactly like the model, I have been getting warnings for low E-BFMI (and occasionally the coefficients \beta_{xy} and \beta_{zy} have been too large, but similar). The E-BFMI is usually between 0.2 and 0.3, sometimes above and sometimes below. According to some discussion (if I remember correctly it was Mr. Carpenter who said) that is the region where one should be concerned. There hasn’t been issues with divergences as in some examples show in the forum. I have run the model with adapt_delta at 0.9 and then there were some divergences, but those seemed false positives. I have read Martin’s blog post Identifying Non-identifiability but I think my model is a bit different because here the response is binary.

I got the feeling from Martin’s blog post that the issue might be related to having only largish values of X observed for the outcome Y. Those values are then in the tail of the sigmoid giving only little information about the parameters.

I would be very thankful for any advice. Please do ask, if my explanations were not sufficient.

data {
  int<lower=1> D; // Dimensions of the features and coefficient vectors
  int<lower=0> N_obs;  // Number of observed observations (with T = 1)
  int<lower=0> N_cens; // Number of censored observations (with T = 0)
  int<lower=0> M; // Number of judges
  real<lower=0> sigma_tau; // Prior for the variance parameters.
  int<lower=1, upper=M> jj_obs[N_obs];   // judge_ID array
  int<lower=1, upper=M> jj_cens[N_cens]; // judge_ID array
  int<lower=0, upper=1> dec_obs[N_obs];   // Decisions for the observed observations
  int<lower=0, upper=1> dec_cens[N_cens]; // Decisions for the censored observations
  row_vector[D] X_obs[N_obs];   // Features of the observed observations 
  row_vector[D] X_cens[N_cens]; // Features of the censored observations 
  int<lower=0, upper=1> y_obs[N_obs]; // Outcomes of the observed observations

parameters {
  // Latent variable
  vector[N_obs] Z_obs;
  vector[N_cens] Z_cens;
  // Intercepts and their variance parameters.
  real<lower=0> sigma_T;
  real<lower=0> sigma_Y;

  vector[M] alpha_T_raw;
  real alpha_Y_raw;

  // Temporary variables to compute the coefficients
  vector[D] a_XT;
  vector[D] a_XY;
  real<lower=0> a_ZT; // Presume latent variable has a positive coefficient.
  real<lower=0> a_ZY;

  // Shared precisions for the coefficients
  real<lower=0> tau_X;
  real<lower=0> tau_Z;


transformed parameters {
  // Coefficients
  vector[D] beta_XT;
  vector[D] beta_XY;
  real<lower=0> beta_ZT; // Presume latent variable has a positive coefficient.
  real<lower=0> beta_ZY;
  // Intercepts
  vector[M] alpha_T;
  real alpha_Y;

  beta_XT = a_XT / sqrt(tau_X);
  beta_XY = a_XY / sqrt(tau_X);
  beta_ZT = a_ZT / sqrt(tau_Z);
  beta_ZY = a_ZY / sqrt(tau_Z);
  alpha_T = sigma_T * alpha_T_raw;
  alpha_Y = sigma_Y * alpha_Y_raw;

model {
  // Latent variable
  Z_obs ~ normal(0, 1);
  Z_cens ~ normal(0, 1);
  // Intercept priors ...
  // Gamma-mixture representation of the coefs
  a_XT ~ normal(0, 1);
  a_XY ~ normal(0, 1);
  a_ZT ~ normal(0, 1);
  a_ZY ~ normal(0, 1);

  // nu = 6 -> nu/2 = 3
  tau_X ~ gamma(3, 3);
  tau_Z ~ gamma(3, 3);

  // Compute the regressions for the observed observations
  for(i in 1:N_obs){
    dec_obs[i] ~ bernoulli_logit(alpha_T[jj_obs[i]] + X_obs[i] * beta_XT + beta_ZT * Z_obs[i]);
    y_obs[i] ~ bernoulli_logit(alpha_Y + X_obs[i] * beta_XY + beta_ZY * Z_obs[i]);

  // Compute the regression for the censored observations
  for(i in 1:N_cens)
    dec_cens[i] ~ bernoulli_logit(alpha_T[jj_cens[i]] + X_cens[i] * beta_XT + beta_ZT * Z_cens[i]);


If you’re going for this:

alpha_T ~ normal(0, sigma_T);
alpha_Y ~ normal(0, sigma_Y);

You’ll need to add some normal(0, 1) priors on these.

This corresponds to:

vector[M] alpha_T_raw;
real alpha_Y_raw;

in parameters,

alpha_T = sigma_T * alpha_T_raw;
alpha_Y = sigma_Y * alpha_Y_raw;

in transformed parameters, and

alpha_T_raw ~ normal(0, 1);
alpha_Y_raw ~ normal(0, 1);

in the model block. (check the non-centered parameterization stuff here: Stan User’s Guide and here: Diagnosing Biased Inference with Divergences)

Also if you don’t have any reason to do otherwise, probly roll with normal priors. You can get E-BFMI warnings with heavy-tailed distributions.

Edit: Also welcome to the forums!

Hi bbbales2 and thanks for your reply!

Actually I already have normal(0, 1) priors for the parameters as you suggested. The “raw” parameters have normal(0, sigma) priors and the sigmas have normal(0, 1) priors.

I’ll try this. I assume you mean similar normal-normal structure as for the intercepts. I have to dig into some calculus to compute the posterior predictive for those to do sensitivity analysis. :)

Thanks again for your help!

The raw ones should be normal(0, 1).

Shouldn’t need any calculus. We’re not doing any conjugacy stuff here. You should be able to just take the samples and run them through your generative model.

1 Like