Including a probability as predictor for logistic regression


#1

I’m building a Stan model where each binary observation y_{i} is imagined to be the result of some base probability q_{i} that is further modified by my other predictors. I have a noisy point estimate for each q_{i} and my question is how to best include this information in the model.

Right now, my model looks like this (I use a regularized horseshoe prior for my betas and student t priors for k and the intercept b_0):

data {
  int<lower=1> N;
  int<lower=1> M;

  int<lower=0, upper=1> y[N];
  matrix[N, M] X;
  vector[N] q;
}

parameters {
  real b0;
  vector[M] beta;
  real k;
}

model {
  y ~ bernoulli_logit(b0 + k*logit(q) + X * beta)
}

This usually works fine, but sometimes my point estimates for q_{i} are exactly 0 or 1 and then my model blows up because logit(1) is Inf. I often don’t trust these point estimates of q_{i} and I want the model to decide to what extend it makes use of this information (this is what I try to achieve with this parameter k).

I guess the quick and dirty way is to truncate each q_{i} to a range between 0.01 and 0.99 (or whatever), but there surely has to be a better way?

Thank you for your advice

Daniel


#2

If you know that the observed q is noisy, you could consider q as a parameter, and model q_obs (which is now q) as an observation with a distribution that does not vanish around the edges (or, depending on the problem, model logit(q) directly this way).