Conditional effects from stan_clogit fit

Dear all,

I’ve used rstanarm::stan_clogit to estimate the effect of a certain intervention on mortality. In short, our design is a matched risk-set case-control study with 1 case and 5 controls per risk set/stratum. I used rms::rcs as suggested previously but I’m now struggling a bit with getting conditional effects of the splines. brms (which has the nice conditional_effects function) doesn’t support the conditional logistic likelihood, as far as I know, which is why I’m sticking with rstanarm. It is possible to re-cast the conditional logistic regression as a Cox regression, but that requires defining the strata, and I can’t seem to find out how to do that in brms.

I feel like I’m missing something obvious, but since I’m quite stuck, I wanted to reach out to the community and hopefully get some pointers on moving forwards with this.

So, to continue with the example from my other post:

library(dplyr)
library(rstanarm)
library(rms)
dat <- arrange(infert, stratum) %>% 
	filter(parity <= 2) # yields 1 case and 2 controls per stratum
post <- stan_clogit(case ~ rcs(age) + spontaneous + induced, 
                    strata = stratum, data = dat, QR = TRUE)

 
As per the posterior_predict documentation,

Drawing from the posterior predictive distribution at interesting values of the predictors also lets us visualize how a manipulation of a predictor affects (a function of) the outcome(s).

All good, but the Details section specifies,

For models estimated with stan_clogit , the number of successes per stratum is ostensibly fixed by the research design. Thus, when doing posterior prediction with new data, the data.frame passed to the newdata argument must contain an outcome variable and a stratifying factor, both with the same name as in the original data.frame. Then, the posterior predictions will condition on this outcome in the new data.

My question is, then, if I wanted to see how age affects the outcome (e.g. over a grid in [18:45]) for women with spontaneous = 0 and induced = 0, how would I go about it? That is, what newdata should I give the posterior_predict? Or am I completely off track and need to do something else? I’m running rstanarm v2.18.2.

Cheers,
Ben

1 Like

I would use the latest rstanarm to get the posterior_epred() function, but it is equivalent to posterior_linpred(*, transform = TRUE) in earlier versions of rstanarm. Anyway, you could do something like

nd <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1", 
                 case = c(1, rep(0, 27))))
mu <- posterior_epred(post, newdata = nd)
summary(rowSums(mu)) # all 1.0

Good stuff, thanks! The stuff about the case column threw me a bit off, but if I understand correctly it doesn’t matter which of the newdata-patients is given the status as case, because it’s solely needed to make the conditional likelihood behave a give correct predictions. Or am I wrong about that?

In general, @bgoodri, thanks for all your help with this! If it’s okay with you, I’d like to acknowledge your help in our manuscript if you don’t mind. If you would prefer for me to not do that, that’s of course very fine – I’ll only do it if I get explicit consent :-)

I compared the above mu with another one that puts the case at the end

nd2 <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1", 
                  case = c(rep(0, 27), 1))
mu2 <- posterior_epred(post, newdata = nd2)

and they were identical. So, I think you are good. I just wouldn’t try to do anything afterward that assumed a different number of successes in a stratum than what the model conditioned on.

If you want to put something in a footnote, go ahead. Stan needs the credit more than I do.

Yeah, my way to the conclusion was same. And I definitely won’t start mingling with other numbers of cases per stratum.

Re acknowledge: great! I already cite Stan in as many ways I can think of – it definitely needs all the publicity it can get.

I’ve bumped into two problems, and maybe they’re more appropriate for the Github repo, but just wanted to see if I’m completely on the wrong track first.

Prep work

library(rstanarm)
library(dplyr)

dat <- arrange(infert, stratum) %>% 
	filter(parity <= 2) %>% 
	mutate(test_covar = rnorm(n(), 10, 3))

# Shared newdata object
nd <- data.frame(age = 18:45, spontaneous = 0, induced = 0, stratum = "1", 
				 education = "0-5yrs", case = c(1, rep(0, 27)), test_covar = 8)

WORKS (equivalent to above)

post <- stan_clogit(case ~ rms::rcs(age) + spontaneous + induced, strata = stratum,
					data = dat, QR = TRUE)
mu <- posterior_epred(post, newdata = nd) 

FAILS

posterior_epred doesn’t like factors when newdata has only one level. If I manually code the contrast it of course works, because the coefficient in essence becomes numerical

post2 <- stan_clogit(case ~ rms::rcs(age) + spontaneous + induced + education, strata = stratum,
					 data = dat, QR = TRUE)
mu2 <- posterior_epred(post2, newdata = nd) 

 
Output:

Error in stanmat[, beta_sel, drop = FALSE] : subscript out of bounds 

FAILS

posterior_epred, it seems, tries to create splines based on the nd argument (with only one value, so that’s of course impossible) instead of simply evaluating the single values based on the learnt spline coefficients.

post3 <- stan_clogit(case ~ rms::rcs(age) + spontaneous + induced + rms::rcs(test_covar), strata = stratum,
					 data = dat, QR = TRUE)
mu3 <- posterior_epred(post3, newdata = nd) 

 
Output:

fewer than 3 unique knots.  Frequency table of variable:
x
 8 
28 
Error in rcspline.eval(x, nk = nknots, inclx = TRUE, pc = pc, fractied = fractied) : 
  
In addition: Warning message:
In rcspline.eval(x, nk = nknots, inclx = TRUE, pc = pc, fractied = fractied) :
  5 knots requested with 1 unique values of x.  knots set to -1 interior values.

Basically, should I open an issue or two at Github? Or am I on a wrong track?

Cheers,
Ben

1 Like

Yeah. Thanks.

Done, and thank you (everybody) for all your work!