Stan_glm() in rstanarm R package - poor fit to data?

areyoujokingme · May 21, 2019, 9:03pm

I’m trying to model the effect of a measured variable on my continuous outcome variable. I know logistic regression requires a discrete outcome variable, but a colleague of mine recommended I try “Bayesian logistic regression” anyway.

The data I have that I want to fit a model to looks like the below:

When I fit a stan_glm() like the below, I get the fit indicated by the line drawn over the scatter plot in the above image. To me, this fit is super poor, because most of the datapoints are in the upper left corner of the scatter plot, yet the model fit is super gradual in its approach towards 1.0 for the outcome variable.

res.model = stan_glm(y.var ~ m,
             data = dff,
             family = binomial(link = 'logit'),
             prior = student_t(df = 3, location = 0.5, scale = 1.0),
             prior_intercept = student_t(df = 7, location = 0, scale = 1.0),
             cores = 4)
y.post = posterior_predict(res.model, newdata=data.frame(m=seq(0,50,0.5)))
pp.post = apply(y.post, 2, sum)/nrow(y.post)
plot(dff$m, dff$y.var, pch=16, xlab="Measured variable score", ylab="Outcome score", cex=2, cex.lab=2, cex.axis=2)
lines(seq(0,50,0.5), pp.post)

Can anyone help explain why the fit is so poor, and give any tips about what to do with my data?

ldeschamps · May 21, 2019, 9:19pm

Hello!

Is your response variable a proportion (i.e. continuous between 0 and 1?). In that case, they could be beta distributed, and you should look at the stan_betareg function ;)

Cheers,
Lucas

areyoujokingme · May 21, 2019, 9:37pm

I tried that with the betareg R package as well as the stan_betareg() function, and I get the same error regarding my dependent variables needing to be the same scale as my outcome variable. My dependent variable is unbounded on the right limit, and bounded by 0 on the left. I worry scaling it to 0 to 1 will be unwise.

ldeschamps · May 21, 2019, 10:03pm

Oh sorry, I thought your dependant variable was restricted between zero and one (because of your plot and the idea of using a logit link function).

If it is not, I am affraid I am not sure what to do. However, I am affraid a binomial distribution is a bad choice, because I do not see anything as a “number of success”.

Note that “logistic” refers to a mathematical function, and that its use is not restricted to the distributions for which it can represent the link function. It is possible to estimate parameters of logistic or logistic like functions in non-linear models, or in differential equation models.

Lucas

Topic		Replies	Views
Improving Performance on Logistic Regression with Informative Priors Modeling performance , rstanarm	4	1558	May 1, 2020
Multivariate analysis Modeling rstan , fitting-issues , specification	4	533	April 22, 2021
Fitting a custom function in rstan Modeling rstan , fitting-issues	2	383	June 18, 2023
Trying to fit my costumised logistic function Modeling specification	30	2417	January 28, 2020
Problem with fitting a model Modeling	4	388	April 10, 2021

Stan_glm() in rstanarm R package - poor fit to data?

Related topics