Predict cumulative cases in a range from a spline model in brms

Angelo_D_Ambrosio · January 21, 2023, 1:52pm

Hello, a simple question whose answer evades me.

I have a count model with a continuous predictor modelled as a spline.
I’d like to compute the predicted cumulative count given a range of the predictor.

Example: Poisson model of the number of patients by age for a hospital, with age being modelled as a spline.
How do I get the posterior predictive distribution of the number of patients between two ages?

The only solutions I could think of were:

to model the cumulative distribution instead of the age distribution directly, then predict for two values and take the difference.
use the original spline model to make predictions for each year of age in the range of interest and then sum them up. I’m not sure if I’d be underestimating the final count by discretizing age this way.
just discretize age into groups already in the training data. I don’t like it since I’d like to keep the smoothing effect of splines.

Is there a way to obtain what I need directly from the original spline model?

jsocolar · February 2, 2023, 5:50pm

Sorry this one got left behind. Given that this is a count model, the actual counts must not be distributed continuously along the predictor, but rather at discrete locations (for example, if the predictor is age and the response is the number of patients, you aren’t looking at the counts of patients by age measured to the nanosecond; you probably are looking at counts by age measured in years or something like that). When you fit a spline to this model, you cannot predict it everywhere, because it isn’t interpretable as the number of points of an exact age, but rather as the number of points falling into an age bin of some width (e.g. one year). So for example, if you have 10 patients who are 20 years old (i.e. between 20 and 21 years old) and 20 patients who are 21 years old (i.e. between 21 and 22 years old), and you fit a line through these points (20, 10) and (21, 20), you might predict that you have 15 patients who are 20.5 years old, by which you would really mean that you predict 15 patients between the ages of 20.5 and 21.5. So you will need to make a set of predictions at some set of discrete values and sum over them. Note that there is no guarantee that the fitted spline will even be self consistent. For example, imagine that the spline runs through the points (20, 1), (20.5, 3), and (21, 1). There is no way for all of these predictions to be simultaneously right. Packing 3 patients into the age range from 20.5 to 21.5 requires packing at either the 20-year-old category or the 21-year-old category has at least 2 patients in it.

I think the simplest suggestion I can give is to (1) make sure that your data come in age bins of equal widths, or else things could get really funky. (2) Predict the spline to the age bins in the data (which will still induce some smoothing) and sum over these predictions to get the expected count within a range (only works if the range you are interested in has a beginning and end time that matches up with the breaks between bins).

Topic		Replies	Views
Regarding prediction/posterior_epred/posterior_predict from brms ordinal model brms posterior-predictive	9	3016	February 24, 2023
Probability that the response is below a given value, conditional on x in a hurdle-lognormal model with spline brms posterior-predictive , splines , brms	4	695	May 3, 2021
Modeling continuous varible using splines in Stan General	8	1346	October 26, 2020
Hierarchical brms with splines (limiting prediction below threshold) brms techniques , brms	6	1401	August 2, 2022
Speeding up predictions in brms brms	2	832	March 28, 2020

Predict cumulative cases in a range from a spline model in brms

Related topics