Survival analysis with cumulative hazards

Jon_Mitchell · November 18, 2024, 3:30pm

Hello all,

I’m doing some work trying to understand how academic performance predicts freshmen retention. I have the GPA for each student each semester, along with some metrics measured as freshmen. At it’s core, the model is that struggling students are more likely to drop out, & that poor performance is one indicator of struggling (though not the only one!)

My data are: if/when the student left, the GPA gap between high school-fall semester, the GPA gap between high school-spring semester (or fall semester-spring semester), and a variety of non-academic “struggle” metrics measured before the student begins fall semester.

The issue I’m having is, if a student does badly in the Fall semester, I think that it increases their chance of dropping out after the Spring semester, even if they did ok. That is, a student’s dropout chance in Spring is a function of both their Spring performance & their fall performance…but I’m not sure the best way to actually model this kind of accumulating hazard as survival analyses aren’t really a thing I do a lot.

Any and all advice would be immensely appreciated!

Here’s code to simulate broadly what I think is going on:

#### Survival model simulation
inv.logit <- function(x)	{
	exp(x) / (1+exp(x))
}
logit <- function(x)	{
	log( x / (1 - x))
}

# parameters
sBetas <- c(0.5, 1.2, 0.75)
aBetas <- c(0.25, 1.1, 0.9)
bStrugGPA <- -1
bDiff <- -1
bDropS <- 1
bDropA <- -1
fBaseDrop <- -5
sBaseDrop <- 1.5

# struggles
Struggles <- rgamma(1e3, 2, 2)
names(Struggles) <- 1:length(Struggles)
fStruggles <- Struggles

Rating1 <- sBetas[1] * Struggles + rnorm(1e3, 0, 0.01)
Rating2 <- sBetas[2] * Struggles + rnorm(1e3, 0, 0.01)
Rating3 <- sBetas[3] * Struggles + rnorm(1e3, 0, 0.01)

# Course difficulty
Diff <- rnorm(1e3)
fDiff <- Diff
aRating1 <- aBetas[1] * Diff + rnorm(1e3, 0, 0.01)
aRating2 <- aBetas[2] * Diff + rnorm(1e3, 0, 0.01)
aRating3 <- aBetas[3] * Diff + rnorm(1e3, 0, 0.01)

# GPA measured as gap between observed GPA minus expected GPA
GPA_gap <- rnorm(1e3, bStrugGPA * Struggles + bDiff * Diff + rnorm(1e3, 0, 0.01), 2)
GPA_gap[which(GPA_gap > 2)] <- 2
GPA_gap[which(GPA_gap < -4)] <- -4
GPA_gap1 <- GPA_gap

# Perform
sStruggles <- fBaseDrop + bDropS * Struggles + bDropA * GPA_gap + rnorm(1e3, 0, 0.01)

fallDrop <- rbinom(1e3, 1, inv.logit(sStruggles))
names(fallDrop) <- names(sStruggles)
spStruggles <- Struggles[which(fallDrop == 0)] + -0.05 * GPA_gap[which(fallDrop == 0)]
spStruggles[which(spStruggles < 0)] <- 0

# second semester
Struggles <- sStruggles
Diff <- rnorm(length(Struggles))
GPA_gap <- rnorm(length(Struggles), bStrugGPA * Struggles + bDiff * Diff + rnorm(length(Struggles), 0, 0.1), 2)
GPA_gap[which(GPA_gap > 2)] <- 2
GPA_gap[which(GPA_gap < -4)] <- -4
GPA_gap2 <- GPA_gap
eStruggles <- sBaseDrop + bDropS * Struggles + bDropA * GPA_gap + rnorm(length(Struggles), 0, 0.01)
sDrop <- rbinom(length(Struggles), 1, inv.logit(eStruggles))
names(sDrop) <- names(Struggles)

left <- c(fallDrop, sDrop) 

## fake data
regdat <- data.frame(id = names(left), left=left, gap = c(GPA_gap1, GPA_gap2), pred1 = Rating1[as.numeric(names(left))], pred2 = Rating2[as.numeric(names(left))], pred3 = Rating3[as.numeric(names(left))], apred1 = aRating1[as.numeric(names(left))], apred2 = aRating2[as.numeric(names(left))], apred3 = aRating3[as.numeric(names(left))])

And here’s the model I’m not satisfied with.

# fit survival model
library(brms)

model <- bf(left | trials(1) ~ 0 + Intercept + gap + pred1 + pred2 + pred3 + apred1 + apred2 + apred3 + (1 + gap|id))

fit <- brm(model, 
	data = regdat,
	family = binomial(),
	prior(normal(0, 1), class = b),
	chains = 4, cores = 4, iter = 2000, warmup = 1000)

It does ok, but I feel like the model itself could be greatly improved, specifically it seems like it is treating the Spring semester dropout rate as a function of only the spring semester data, really.

I’m looking for advice/guidance on the best way to do so. I used brms here, but rstarnarm or even just straight stan are fine–I just am unclear on the best way to actually build the model here.

Thank you!

Bob_Carpenter · November 18, 2024, 8:18pm

Why not just code that model up in Stan directly?

If there are just two time periods and you aren’t looking at continuous dropout times, this doesn’t look like a standard survival model.

Topic		Replies	Views
Competing risks survival models Modeling	4	1331	April 25, 2018
Simulation Study of Survival-Longitudinal Data Joint Modeling - Some Estimations Are Off Modeling	2	707	August 29, 2020
I am preparing for a 'hackathon' my company is hosting, and could use some advice regarding if my idea is correct Modeling techniques	1	372	November 8, 2022
Censored data loglikelihood trouble Modeling	1	369	April 21, 2023
Recurrent Events Survival model Modeling	6	981	August 1, 2022

Survival analysis with cumulative hazards

Related topics