I’m trying to model the effect of a measured variable on my continuous outcome variable. I know logistic regression requires a discrete outcome variable, but a colleague of mine recommended I try “Bayesian logistic regression” anyway.
The data I have that I want to fit a model to looks like the below:
When I fit a stan_glm() like the below, I get the fit indicated by the line drawn over the scatter plot in the above image. To me, this fit is super poor, because most of the datapoints are in the upper left corner of the scatter plot, yet the model fit is super gradual in its approach towards 1.0 for the outcome variable.
res.model = stan_glm(y.var ~ m,
data = dff,
family = binomial(link = 'logit'),
prior = student_t(df = 3, location = 0.5, scale = 1.0),
prior_intercept = student_t(df = 7, location = 0, scale = 1.0),
cores = 4)
y.post = posterior_predict(res.model, newdata=data.frame(m=seq(0,50,0.5)))
pp.post = apply(y.post, 2, sum)/nrow(y.post)
plot(dff$m, dff$y.var, pch=16, xlab="Measured variable score", ylab="Outcome score", cex=2, cex.lab=2, cex.axis=2)
lines(seq(0,50,0.5), pp.post)
Can anyone help explain why the fit is so poor, and give any tips about what to do with my data?