Binomial regression with a log link

Hi,

I am trying to fit a binomial regression with a log link to some count data to estimate a relative risk (model posted last). The model seems to work OK, and retrieves the expected relative risk corrected for age (RR = exp(betaA)). However, I am sometimes running into the problem of bad initialisation. Namely that theta > 1. I’ve tried altering the init range from 0.0000000000001-5, with no luck.

I saw there was some previous discussion on binomial models with a log link (Log binomial random intercept model). From what I gathered, the conclusion was that the log-link is not good for probabilities. However, if not using the binomial-log model I am unsure how to model relative risks. I could, technically, fit any kind of model and then simulate the counts using the input variables and calculate the Cochran-Mantel-Haenzel adjusted RR (http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_Confounding-EM/BS704-EP713_Confounding-EM7.html). This, however, poses two problems:

  1. Which alternative model to fit? A fully saturated poisson log-linear model would involve a lot of interactions and seems undesirable (especially if I want to examine a risk factor x age interaction at some point)
  2. I might end up with some zero division, if some values of c in the posterior are zero (notation according to the website)

Alternatively, I could provide some custom initialisation values and hope Stan figures out how to effectively explore the posterior – it seems to be running OK as long as the initialisation is valid. One option could be to pick a set of values which sum < 0. Would this still be considered a diffuse initialisation, so that the R-hat values are appropriate?

Any advice would be much appreciated.

Thanks in advance!

data {
  int <lower=0> N; // Number of rows
  int <lower=0> B[N]; // Count of patients in group
  int <lower=0> Total[N]; // Count of patients in group

  vector<lower=0, upper=1>[N] A;
  vector<lower=0, upper=1>[N] Agebin6;
  vector<lower=0, upper=1>[N] Agebin7;
  vector<lower=0, upper=1>[N] Agebin8;
  vector<lower=0, upper=1>[N] Agebin9;
  vector<lower=0, upper=1>[N] Agebin10;
  vector<lower=0, upper=1>[N] Agebin11;
  vector<lower=0, upper=1>[N] Agebin12;
  vector<lower=0, upper=1>[N] Agebin13;
  vector<lower=0, upper=1>[N] Agebin14;
  vector<lower=0, upper=1>[N] Agebin15;
  vector<lower=0, upper=1>[N] Agebin16;
  vector<lower=0, upper=1>[N] Agebin17;
}
parameters {
  // Coefficients
  vector[J] beta0; 
  vector[J] betaA;
  vector[J] betaAge6;
  vector[J] betaAge7;
  vector[J] betaAge8;
  vector[J] betaAge9;
  vector[J] betaAge10;
  vector[J] betaAge11;
  //vector[J] betaAge12; // Reference
  vector[J] betaAge13;
  vector[J] betaAge14;
  vector[J] betaAge15;
  vector[J] betaAge16;
  vector[J] betaAge17;
}
model {
  beta0 ~ normal(0, 1);
  betaA ~ normal(0, 1);

  betaAge6 ~ normal(0, 1);
  betaAge7 ~ normal(0, 1);
  betaAge8 ~ normal(0, 1);
  betaAge9 ~ normal(0, 1);
  betaAge10 ~ normal(0, 1);
  betaAge11 ~ normal(0, 1);
  //betaAge12 ~ normal(0, 1);
  betaAge13 ~ normal(0, 1);
  betaAge14 ~ normal(0, 1);
  betaAge15 ~ normal(0, 1);
  betaAge16 ~ normal(0, 1);
  betaAge17 ~ normal(0, 1);

  /* Likelihood specification */  
  {
    vector[N] theta;
    theta = exp(beta0[j] + 
            betaA[j] .* A + 
            betaAge6[j] .* Agebin6 +
            betaAge7[j] .* Agebin7 +
            betaAge8[j] .* Agebin8 + 
            betaAge9[j] .* Agebin9 + 
            betaAge10[j] .* Agebin10 + 
            betaAge11[j] .* Agebin11 +
            //betaAge12[j] .* Agebin12 + 
            betaAge13[j] .* Agebin13 + 
            betaAge14[j] .* Agebin14 + 
            betaAge15[j] .* Agebin15 + 
            betaAge16[j] .* Agebin16 + 
            betaAge17[j] .* Agebin17);
    // Increment likelihood
    B ~ binomial(Total, theta);
  }
}```
1 Like

Yeah log links with binomial models are tricky. What about using a logit link? Then the parameter estimates are log-odds of course, but you can still compute conditional probabilities in the generated quantities block using the posterior predictive distribution rather than having to work with the log-odds. Could you then calculate the relative risk from those probabilities without needing to use the log-link? Or is there some reason the log link is absolutely necessary?

Update: see Ben’s post below for an example.

2 Likes

I just read the previous post you linked to. What I’m suggesting is basically what @betanalpha suggested in that post:

In order to understand relative risks, for example, the full probability curve (with uncertainties!) can be produced in the generated quantities block by varying the input covariates. The inclusion of principled priors and reframing the inferences in terms of entire distributions avoid the fragile estimators.

1 Like

Why is this? Binomial with log link are used often in epi for RR models so it will be useful for me to know if this ever comes up!

The key is to think about relative risks (or whatever is of interest) as an implication of the generative process you are trying to model instead of necessarily being a parameter in that model. In the simple case, you could estimate a Bernoulli or a binomial model with a good link function (not log), obtain the posterior distribution of the probabilities under different scenarios, and then divide to obtain a relative risk ratio. In rstanarm (or brms) it would look like

library(rstanarm)
post <- stan_glm(y ~ treatment + x, data = dataset, family = binomial(link = "logit"), QR = TRUE)
nd <- dataset
nd$treatment <- 0
Pr_0 <- posterior_linpred(post, newdata = nd, transform = TRUE)
nd$treatment <- 1
Pr_1 <- posterior_linpred(post, newdata = nd, transform = TRUE)
RR <- Pr_1 / Pr_0
dim(RR)

where dim(RR) will be number of simulations by number of observations and you can summarize that however you want.

6 Likes

Primarily because probabilities have to be between 0 and 1. If you have eta = alpha + x * beta, which can take on any real value, then inv_logit(eta) is always a valid probability, but not exp(eta). That is, exp(x) is not a function from the reals to the unit interval.

@timdisher to add a bit more to my previous response:

Using the log link when it makes no sense is just a hack for getting at RR, if I’m not mistaken. There are all sorts of hacks for getting around estimation problems while still using the log link, but as @bgoodri alluded to, if you use a generative modeling approach you are free to use a more coherent link function and there’s no such problem.

Thanks @jonah, this really helps. Beyond estimation issues within a bayesian framework, are there established practical issues with these types of models (thinking when people use frequentist methods where you don’t pay the same time cost)? Clinical colleagues will typically balk if established methods give an essentially identical point estimate. This is probably an unfair question because it’s complicated by differences in prior specification as well (those people I know that are using Bayesian methods are also using old school super vague flat priors), but I thought I would ask.

The issue with the log link is not a Bayesian thing, it’s a math thing. It also applies to non-Bayesian methods (and probably originates there too).

To be more clear, the problem originates from using the log link in the likelihood, so maximum likelihood approaches have to confront this as well.

But with optimization, frequentists can pretend everything is okay if the point estimate is within the admissible region and may be so for most or all of the datasets that could be sampled from that population. But to be Bayesian you have to take into account the uncertainty in the parameter conditional only on the data that you have (and your priors), which is a numerical disaster when you get to the frontier of the admissible region, so it is best to constrain the problem with an inverse link that maps from the entire real line to the \left(0,1\right) interval.

4 Likes

Sorry, I think I got distracted by focusing on the error message. I guess what my question would actually be is that I frequently encounter log link binomial models at conferences that are fit in MLE and “work”. Is the issue here that those people got lucky and they may have rejected other model specifications that failed to converge, or that there is an actual issue with their estimates (i.e. they don’t have the properties they think they do)?

It mostly means that they don’t have the interpretation that people think they do. As long as the population parameters are on the interior of the parameter space, then under some other assumptions, the distribution of the maximum likelihood estimates across random samples from that population is asymptotically multivariate normal and you can use the normal distribution to make confidence intervals, p-values, etc.

None of those point estimates is what anyone should expect the relative risk ratio to be, conditional on the data they have actually observed. That is the expectation of the posterior distribution of relative risk and in order to draw from that posterior distribution in practice, you have to avoid falling into the inadmissible region.

1 Like

Thanks a lot for all the great replies!

It seems that the best way forward is simply to calculate the relative risk from the posterior prediction of counts, and use an appropriate formula for the (in this case) age adjusted RR. It’s a bit of a hassle since it’s a summation across all strata that needs to be corrected for. Specifically, the formula (from the website), is

RR = \frac{\sum_i \frac{a_i (c_i + d_i) }{n_i}}{\sum_i \frac{c_i (a_i + b_i)}{n_i}}

in which i is a strata. This raises the second concern in my first post. Namely, if some value of c_i is zero, I will be encountering zero division. I could add a pseudo-count, but this is generally ill perceived in epidemiology since it might bias the estimates (see e.g. https://onlinelibrary.wiley.com/doi/full/10.1002/sim.1761 - might be behind paywall). Are there any tips or tricks for dealing with this?

And yes, the binomial-log model is generally known to directly estimate the relative risk in epidemiology (frequentists and Bayesians alike), but it is well known it can be extremely difficult to get to converge. I wanted to use it as well, since it’s not really questioned. It will be quite the experience to convince reviewers, who are not always that stat savvy, that I can use a log-odds model to estimate the adjusted relative risk. :-)

1 Like

A bit more of literature reading, and I realized I could use a Poisson model instead. It does not yield the relative risk, but the relative risk ratio. For my purpose, this is good enough.

For those also having this problem, this is the Poisson model I used instead.

stan_results <- rstanarm::stan_glm(B ~ A + AgeBin,
                     data = df,
                     family = poisson(link='log'), offset=log(Total),
                     prior = normal(0, 1), prior_intercept = normal(0, 1),
                     chains = 4, cores = 4, seed = 123456)

Now another issue is that the number of people within each age group vary a lot, ranging from 2 individuals to 12,000 individuals. This causes the coefficients for the age to be quite. The MLE estimate ranges from -18 to ~0. Even with very wide priors and QR=TRUE, Stan comes nowhere close to this. This might be OK, because it does not seem to change the estimate of the exposure factor, A.

The issue is on the high end. If prevalence is rare (small event probabilities) predicted probabilities will rarely be greater than 1 with the log link. Less than zero is not a problem since Exp(x) >0. For rare events Poisson or binomial with log link are almost identical. Albeit the binomial is not strictly an admissible distribution with log link.

Hello @bgoodri,
how then your compute R estimates (its mean), and the 95% C.I?

From the posterior draws

Hi @bgoodri
I am confused, can you check this please, you think I made a mistake:

> datset = data.frame(
+   death = sample(0:1,190,replace = T),
+   treatment = sample(0:1,190,replace = T),
+   age = sample(seq(25,60,by=1), 190, replace = T)
+ )
> 
> str(datset)
'data.frame':	190 obs. of  3 variables:
 $ death    : int  1 0 1 0 1 0 0 1 1 1 ...
 $ treatment: int  0 0 0 0 0 0 0 1 1 0 ...
 $ age      : num  31 35 35 50 25 43 31 50 57 37 ...
> 
> library(rstanarm)
> post <- stan_glm(death ~ treatment +age, 
+                  data = datset, 
+                  family = binomial(link = "logit"), 
+                  QR = FALSE,
+                  refresh=0)
> nd <- datset
> nd$treatment <- 0
> Pr_0 <- posterior_linpred(post, newdata = nd, transform = TRUE)
> nd$treatment <- 1
> Pr_1 <- posterior_linpred(post, newdata = nd, transform = TRUE)
> RR <- Pr_1 / Pr_0
> dim(RR)
[1] 4000  190
> 
> RR_average= apply(RR,2,mean)
> length(RR_average)
[1] 190
> 
> stat = function(x)
+   c(Mean = mean(x),  
+     quantile(x, 0.025), 
+     quantile(x,0.975))
> 
> stat(RR_average)
     Mean      2.5%     97.5% 
0.9369137 0.9282050 0.9464520 
> 
> 
> ## If I compare that with family poisson
> 
> post2 <- stan_glm(death ~ treatment +age, 
+                  data = datset, 
+                  family = poisson(link = "log"), 
+                  QR = FALSE,
+                  refresh=0)
> exp(coef(post2))
(Intercept)   treatment         age 
  0.4464778   0.9258809   1.0056787 

Why would family = poisson ever be appropriate for binary outcomes?

I do not know, but tested that on many datasets and it was very good in computing RR,

but what is your opinion with y computation I did?