Binomial regression with a log link

Computing the RR via posterior_linpred is the right way to go.

1 Like

ok,
another thing please, if I wan to compute the Absolute risk reduction? what I should do ?
ARR <- Pr_1 - Pr_0??

what is Total in offset(Total)?
I did a discussion with @bgoodri down, could you please feed me back?

If I used here also the family =gaussian(link = “log”), it will work well and give more RR value close to those done by winBug

library(rstanarm)
post <- stan_glm(y ~ treatment + x, 
                data = dataset,
                 family = gaussian(link = "log"), 
                 prior = normal(location =0, scale = 10, autoscale = F),
                 prior_intercept = NULL,
                 prior_aux = NULL,
                 QR = TRUE)
nd <- dataset
nd$treatment <- 0
Pr_0 <- posterior_linpred(post, newdata = nd, transform = TRUE)
nd$treatment <- 1
Pr_1 <- posterior_linpred(post, newdata = nd, transform = TRUE)
RR <- Pr_1 / Pr_0
dim(RR)

y and treat variables are binary variables, but they were defined as.integer with R. could any one confirm, if gaussian(link = “log”) work to compute the RR with this approach?

You can do all that posterior_linpred stuff to generate the posterior distribution of the relative risk without doing family = gaussian(link = "log") in stan_glm. Just do family = binomial(link = "logit") (or link = "probit") so that the inverse link function yields probabilities for all inputs.

I full agree with you, but however I like this package there are many things strange and confused for me, for example if you want to compute the ORs from your way:

> mydata = data.frame(treat = c( rep(0, 68), rep(1, 80), rep(0, 57), rep(1, 44)),
+                     death = c( rep(0, 68), rep(0, 80), rep(1, 57), rep(1, 44) ) )
> post <- stan_glm(death~treat, 
+                  data = mydata, 
+                  family =binomial(link = "logit"), 
+                  prior=normal(location = 0, scale = 10, autoscale = F),
+                  prior_intercept = NULL,
+                  prior_aux = NULL,
+                  QR = FALSE,
+                  chains = 3,
+                  seed=123456,
+                  refresh=0)
> #if I want to comput the OR from the sampel data 
> samples = as.matrix(post)
> ORs = exp(samples[,"treat"])
> mean(ORs)
[1] 0.6804256
> ## while 
> exp(coef(post))
(Intercept)       treat 
  0.8387465   0.6579191 
> ## while 
> exp(coef(post))["treat"]
    treat 
0.6579191 
> ## post online 
> mydata = data.frame(treat = c( rep(0, 68), rep(1, 80), rep(0, 57), rep(1, 44)),
+                     Dc = c( rep(0, 68), rep(0, 80), rep(1, 57), rep(1, 44) ) )
> 
> post <- stan_glm(Dc~treat, 
+                  data = mydata, 
+                  family =binomial(link = "logit"), 
+                  prior=normal(location = 0, scale = 10, autoscale = F),
+                  prior_intercept = NULL,
+                  prior_aux = NULL,
+                  QR = FALSE,
+                  chains = 3,
+                  seed=123456,
+                  refresh=0)
> 
> #if I want to comput the OR from the sampel data 
> samples = as.matrix(post)
> ORs = exp(samples[,"treat"])
> mean(ORs)
[1] 0.6804256
> 
> ## while 
> exp(coef(post))["treat"]
    treat 
0.6579191 
> ## But note that 
> OR = exp(mean(samples[,"treat"]))
> OR
[1] 0.6572938

what is the right way, since sometimes you want the ORs sample to do some computations.
and from where these differences ???

Always take the mean last (irrespective of what function produced the posterior draws). So, mean(exp(as.data.frame(post)$treat)) is right and exp(mean(as.data.frame(post)$treat)) is horribly wrong.

OK, but OR = exp(coef(post)[“treat”]) = 0.6579191, in the direct way (the frequent way ),

> mean(exp(as.data.frame(post)$treat))
[1] 0.6804256

> exp(coef(post)["treat"])
    treat 
0.6579191 
# The difference is: 
> mean(exp(as.data.frame(post)$treat)) -  exp(coef(post)["treat"])
     treat 
0.02250656 
``

why this difference?

Because mean(exp(as.data.frame(post)$treat)) is right and either exp(coef(post)["treat"] or exp(mean(as.data.frame(post)$treat)) are horribly wrong. So, they cannot be equal.

Thanks a lot :)

I found this article which confirm that we can use log Gaussian and log Poisson in addition to log binomial to estimate the RR

https://journals.sagepub.com/doi/pdf/10.1177/1536867X0900900201

Section 5 of that paper barely makes sense as a Frequentist estimator. Using a Gaussian likelihood for binary outcomes is contrary to the generative nature of Bayesian modeling.

2 Likes

Thanks Ben, do you have any suggestion on how to approach a continuous predictor in this way?
I’m also resorted to the counterfactual conditional predicted probability to estimate relative risks, but I couldn’t find a smart way to do this with continuous variables unless I pick up reference values to compare against a baseline (I usually use compare the 30th, 50th, 70th, 90th percentile against the 10th).
Is there a way to estimate one linear RR with its posterior distribution?

I was wondering, would it be possible to impose a prior on eta to force it below 0 or would it mess up interpretation?

You are only going to get a constant RR with a log link.

Uhm, I’m not sure what you mean.
I meant using the predictions from a logistic regression as you explained. is there a way to get the RR for a general unit increase assuming a constant (one beta) effect?

I mean I think the RR is always going to depend in a nonlinear way on what levels of the predictors you predict at, unless the log link were used when fitting the model.

So there is no way to test and evaluate the presence a linear effect using the counterfactual predictions?

I don’t think you can enforce a constraint that it be linear after-the-fact but it might turn out that way.