Stan_glm() in rstanarm R package - poor fit to data?

I’m trying to model the effect of a measured variable on my continuous outcome variable. I know logistic regression requires a discrete outcome variable, but a colleague of mine recommended I try “Bayesian logistic regression” anyway.

The data I have that I want to fit a model to looks like the below:

When I fit a stan_glm() like the below, I get the fit indicated by the line drawn over the scatter plot in the above image. To me, this fit is super poor, because most of the datapoints are in the upper left corner of the scatter plot, yet the model fit is super gradual in its approach towards 1.0 for the outcome variable.

res.model = stan_glm(y.var ~ m,
             data = dff,
             family = binomial(link = 'logit'),
             prior = student_t(df = 3, location = 0.5, scale = 1.0),
             prior_intercept = student_t(df = 7, location = 0, scale = 1.0),
             cores = 4)
y.post = posterior_predict(res.model, newdata=data.frame(m=seq(0,50,0.5)))
pp.post = apply(y.post, 2, sum)/nrow(y.post)
plot(dff$m, dff$y.var, pch=16, xlab="Measured variable score", ylab="Outcome score", cex=2, cex.lab=2, cex.axis=2)
lines(seq(0,50,0.5), pp.post)

Can anyone help explain why the fit is so poor, and give any tips about what to do with my data?

Hello!

Is your response variable a proportion (i.e. continuous between 0 and 1?). In that case, they could be beta distributed, and you should look at the stan_betareg function ;)

Cheers,
Lucas

I tried that with the betareg R package as well as the stan_betareg() function, and I get the same error regarding my dependent variables needing to be the same scale as my outcome variable. My dependent variable is unbounded on the right limit, and bounded by 0 on the left. I worry scaling it to 0 to 1 will be unwise.

Oh sorry, I thought your dependant variable was restricted between zero and one (because of your plot and the idea of using a logit link function).

If it is not, I am affraid I am not sure what to do. However, I am affraid a binomial distribution is a bad choice, because I do not see anything as a “number of success”.

Note that “logistic” refers to a mathematical function, and that its use is not restricted to the distributions for which it can represent the link function. It is possible to estimate parameters of logistic or logistic like functions in non-linear models, or in differential equation models.

Lucas