# Logistic regression with measurement errors / latent variable - low E-BFMI

Hello,

I am trying to model logistic regression with a latent variable (or equivalently to perform the regression with measurement errors). This post (Non-centered parameterization of measurement error model) had a similar problem which was solved by only constraining the sd to positive real axis. I have been having issues with low E-BFMI and some scale issues.

More specifically, I am working with two separate logistic regression models shown in abridged form below. In the model we have unobserved noise Z and observed features of individuals X affecting a decision T. If decision is positive (T=1) then we can observe a binary outcome Y, otherwise not. The M decision-makers have different baseline probabilities for positive decisions modelled with the intercepts \alpha_t. Eventually we are interested in predicting the value for Y for those who were given a negative decision.

In the model, \beta coefficients have t_6 priors as per your recommendations and the intercepts have been given priors. The coefficient priors have all been non-centralized using the gamma-mixture representation shown in the manual.

\begin{align} Z &\sim N(0, 1) \\ T &\sim \text{Bernoulli}(\sigma(\alpha_t + \beta_{xt}x + \beta_{zt}z)), ~ \text{ where }~ t = 1, \ldots, M\\ Y ~ | ~T=1 &\sim \text{Bernoulli}(\sigma(\alpha_y + \beta_{xy}x + \beta_{zy}z)) \end{align}

The issue with this model is that even when the data has been generated exactly like the model, I have been getting warnings for low E-BFMI (and occasionally the coefficients \beta_{xy} and \beta_{zy} have been too large, but similar). The E-BFMI is usually between 0.2 and 0.3, sometimes above and sometimes below. According to some discussion (if I remember correctly it was Mr. Carpenter who said) that is the region where one should be concerned. There hasn’t been issues with divergences as in some examples show in the forum. I have run the model with adapt_delta at 0.9 and then there were some divergences, but those seemed false positives. I have read Martin’s blog post Identifying Non-identifiability but I think my model is a bit different because here the response is binary.

I got the feeling from Martin’s blog post that the issue might be related to having only largish values of X observed for the outcome Y. Those values are then in the tail of the sigmoid giving only little information about the parameters.

I would be very thankful for any advice. Please do ask, if my explanations were not sufficient.

data {
int<lower=1> D; // Dimensions of the features and coefficient vectors
int<lower=0> N_obs;  // Number of observed observations (with T = 1)
int<lower=0> N_cens; // Number of censored observations (with T = 0)
int<lower=0> M; // Number of judges
real<lower=0> sigma_tau; // Prior for the variance parameters.
int<lower=1, upper=M> jj_obs[N_obs];   // judge_ID array
int<lower=1, upper=M> jj_cens[N_cens]; // judge_ID array
int<lower=0, upper=1> dec_obs[N_obs];   // Decisions for the observed observations
int<lower=0, upper=1> dec_cens[N_cens]; // Decisions for the censored observations
row_vector[D] X_obs[N_obs];   // Features of the observed observations
row_vector[D] X_cens[N_cens]; // Features of the censored observations
int<lower=0, upper=1> y_obs[N_obs]; // Outcomes of the observed observations
}

parameters {
// Latent variable
vector[N_obs] Z_obs;
vector[N_cens] Z_cens;

// Intercepts and their variance parameters.
real<lower=0> sigma_T;
real<lower=0> sigma_Y;

vector[M] alpha_T_raw;
real alpha_Y_raw;

// Temporary variables to compute the coefficients
vector[D] a_XT;
vector[D] a_XY;
real<lower=0> a_ZT; // Presume latent variable has a positive coefficient.
real<lower=0> a_ZY;

// Shared precisions for the coefficients
real<lower=0> tau_X;
real<lower=0> tau_Z;

}

transformed parameters {

// Coefficients
vector[D] beta_XT;
vector[D] beta_XY;
real<lower=0> beta_ZT; // Presume latent variable has a positive coefficient.
real<lower=0> beta_ZY;

// Intercepts
vector[M] alpha_T;
real alpha_Y;

beta_XT = a_XT / sqrt(tau_X);
beta_XY = a_XY / sqrt(tau_X);
beta_ZT = a_ZT / sqrt(tau_Z);
beta_ZY = a_ZY / sqrt(tau_Z);

alpha_T = sigma_T * alpha_T_raw;
alpha_Y = sigma_Y * alpha_Y_raw;
}

model {
// Latent variable
Z_obs ~ normal(0, 1);
Z_cens ~ normal(0, 1);

// Intercept priors ...

// Gamma-mixture representation of the coefs
a_XT ~ normal(0, 1);
a_XY ~ normal(0, 1);
a_ZT ~ normal(0, 1);
a_ZY ~ normal(0, 1);

// nu = 6 -> nu/2 = 3
tau_X ~ gamma(3, 3);
tau_Z ~ gamma(3, 3);

// Compute the regressions for the observed observations
for(i in 1:N_obs){
dec_obs[i] ~ bernoulli_logit(alpha_T[jj_obs[i]] + X_obs[i] * beta_XT + beta_ZT * Z_obs[i]);
y_obs[i] ~ bernoulli_logit(alpha_Y + X_obs[i] * beta_XY + beta_ZY * Z_obs[i]);
}

// Compute the regression for the censored observations
for(i in 1:N_cens)
dec_cens[i] ~ bernoulli_logit(alpha_T[jj_cens[i]] + X_cens[i] * beta_XT + beta_ZT * Z_cens[i]);
}


2 Likes

If you’re going for this:

alpha_T ~ normal(0, sigma_T);
alpha_Y ~ normal(0, sigma_Y);


You’ll need to add some normal(0, 1) priors on these.

This corresponds to:

vector[M] alpha_T_raw;
real alpha_Y_raw;


in parameters,

alpha_T = sigma_T * alpha_T_raw;
alpha_Y = sigma_Y * alpha_Y_raw;


in transformed parameters, and

alpha_T_raw ~ normal(0, 1);
alpha_Y_raw ~ normal(0, 1);


in the model block. (check the non-centered parameterization stuff here: Stan User’s Guide and here: Diagnosing Biased Inference with Divergences)

Also if you don’t have any reason to do otherwise, probly roll with normal priors. You can get E-BFMI warnings with heavy-tailed distributions.

Edit: Also welcome to the forums!