# Logistic regression with measurement errors / latent variable - low E-BFMI

Hello,

I am trying to model logistic regression with a latent variable (or equivalently to perform the regression with measurement errors). This post (Non-centered parameterization of measurement error model) had a similar problem which was solved by only constraining the sd to positive real axis. I have been having issues with low E-BFMI and some scale issues.

More specifically, I am working with two separate logistic regression models shown in abridged form below. In the model we have unobserved noise Z and observed features of individuals X affecting a decision T. If decision is positive (T=1) then we can observe a binary outcome Y, otherwise not. The M decision-makers have different baseline probabilities for positive decisions modelled with the intercepts \alpha_t. Eventually we are interested in predicting the value for Y for those who were given a negative decision.

In the model, \beta coefficients have t_6 priors as per your recommendations and the intercepts have been given priors. The coefficient priors have all been non-centralized using the gamma-mixture representation shown in the manual.

\begin{align} Z &\sim N(0, 1) \\ T &\sim \text{Bernoulli}(\sigma(\alpha_t + \beta_{xt}x + \beta_{zt}z)), ~ \text{ where }~ t = 1, \ldots, M\\ Y ~ | ~T=1 &\sim \text{Bernoulli}(\sigma(\alpha_y + \beta_{xy}x + \beta_{zy}z)) \end{align}

The issue with this model is that even when the data has been generated exactly like the model, I have been getting warnings for low E-BFMI (and occasionally the coefficients \beta_{xy} and \beta_{zy} have been too large, but similar). The E-BFMI is usually between 0.2 and 0.3, sometimes above and sometimes below. According to some discussion (if I remember correctly it was Mr. Carpenter who said) that is the region where one should be concerned. There hasnâ€™t been issues with divergences as in some examples show in the forum. I have run the model with adapt_delta at 0.9 and then there were some divergences, but those seemed false positives. I have read Martinâ€™s blog post Identifying Non-identifiability but I think my model is a bit different because here the response is binary.

I got the feeling from Martinâ€™s blog post that the issue might be related to having only largish values of X observed for the outcome Y. Those values are then in the tail of the sigmoid giving only little information about the parameters.

I would be very thankful for any advice. Please do ask, if my explanations were not sufficient.

data {
int<lower=1> D; // Dimensions of the features and coefficient vectors
int<lower=0> N_obs;  // Number of observed observations (with T = 1)
int<lower=0> N_cens; // Number of censored observations (with T = 0)
int<lower=0> M; // Number of judges
real<lower=0> sigma_tau; // Prior for the variance parameters.
int<lower=1, upper=M> jj_obs[N_obs];   // judge_ID array
int<lower=1, upper=M> jj_cens[N_cens]; // judge_ID array
int<lower=0, upper=1> dec_obs[N_obs];   // Decisions for the observed observations
int<lower=0, upper=1> dec_cens[N_cens]; // Decisions for the censored observations
row_vector[D] X_obs[N_obs];   // Features of the observed observations
row_vector[D] X_cens[N_cens]; // Features of the censored observations
int<lower=0, upper=1> y_obs[N_obs]; // Outcomes of the observed observations
}

parameters {
// Latent variable
vector[N_obs] Z_obs;
vector[N_cens] Z_cens;

// Intercepts and their variance parameters.
real<lower=0> sigma_T;
real<lower=0> sigma_Y;

vector[M] alpha_T_raw;
real alpha_Y_raw;

// Temporary variables to compute the coefficients
vector[D] a_XT;
vector[D] a_XY;
real<lower=0> a_ZT; // Presume latent variable has a positive coefficient.
real<lower=0> a_ZY;

// Shared precisions for the coefficients
real<lower=0> tau_X;
real<lower=0> tau_Z;

}

transformed parameters {

// Coefficients
vector[D] beta_XT;
vector[D] beta_XY;
real<lower=0> beta_ZT; // Presume latent variable has a positive coefficient.
real<lower=0> beta_ZY;

// Intercepts
vector[M] alpha_T;
real alpha_Y;

beta_XT = a_XT / sqrt(tau_X);
beta_XY = a_XY / sqrt(tau_X);
beta_ZT = a_ZT / sqrt(tau_Z);
beta_ZY = a_ZY / sqrt(tau_Z);

alpha_T = sigma_T * alpha_T_raw;
alpha_Y = sigma_Y * alpha_Y_raw;
}

model {
// Latent variable
Z_obs ~ normal(0, 1);
Z_cens ~ normal(0, 1);

// Intercept priors ...

// Gamma-mixture representation of the coefs
a_XT ~ normal(0, 1);
a_XY ~ normal(0, 1);
a_ZT ~ normal(0, 1);
a_ZY ~ normal(0, 1);

// nu = 6 -> nu/2 = 3
tau_X ~ gamma(3, 3);
tau_Z ~ gamma(3, 3);

// Compute the regressions for the observed observations
for(i in 1:N_obs){
dec_obs[i] ~ bernoulli_logit(alpha_T[jj_obs[i]] + X_obs[i] * beta_XT + beta_ZT * Z_obs[i]);
y_obs[i] ~ bernoulli_logit(alpha_Y + X_obs[i] * beta_XY + beta_ZY * Z_obs[i]);
}

// Compute the regression for the censored observations
for(i in 1:N_cens)
dec_cens[i] ~ bernoulli_logit(alpha_T[jj_cens[i]] + X_cens[i] * beta_XT + beta_ZT * Z_cens[i]);
}


2 Likes

If youâ€™re going for this:

alpha_T ~ normal(0, sigma_T);
alpha_Y ~ normal(0, sigma_Y);


Youâ€™ll need to add some normal(0, 1) priors on these.

This corresponds to:

vector[M] alpha_T_raw;
real alpha_Y_raw;


in parameters,

alpha_T = sigma_T * alpha_T_raw;
alpha_Y = sigma_Y * alpha_Y_raw;


in transformed parameters, and

alpha_T_raw ~ normal(0, 1);
alpha_Y_raw ~ normal(0, 1);


in the model block. (check the non-centered parameterization stuff here: Stan Userâ€™s Guide and here: Diagnosing Biased Inference with Divergences)

Also if you donâ€™t have any reason to do otherwise, probly roll with normal priors. You can get E-BFMI warnings with heavy-tailed distributions.

Edit: Also welcome to the forums!