I am working with a piecewise model of viral trajectories with the following function in Stan:
// mufun returns mean Ct:
real mufun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
// Proliferation
if(t <= peaktime)
return(peak + pslope * t);
// Clearance
else
return(peak + (pslope * peaktime + cslope * (t - peaktime)));
}
and am calculating out of sample (smoother) trajectories by increasing the number of x
values (time).
When I do so by sampling the parameters and manually calculating the estimate, I get fairly broad uncertainty:
parameter_frame %>%
select(omicron, episode, peak, peaktime, pslope, cslope, .draw) %>%
expand_grid(.
, Time = seq(-10, 18, 1)) %>%
mutate(Pred = ifelse(Time < peaktime, peak + pslope * Time
, peak + (pslope * peaktime + cslope * (Time - peaktime)))
)
When I use the stand-alone generated quantities feature of cmdstanr
, I get much narrower uncertainty:
I am creating the prediction data with the following function:
make_pred_data <- function(data, day_space){
newcols <- c("de.id", "omicron", "episode", "ep_time_c", "days.from.peak", "ORF_lod")
expand_grid(de.id = "1"
, omicron = c(unique(data$omicron))
, episode = c(unique(data$episode))
, ep_time_c = 0
, days.from.peak = seq(min(data$days.from.peak), max(data$days.from.peak), by = day_space)
, ORF_lod = lod) %>%
rename_with(~ newcols)
}
and passing to Stan as:
Stan_Pred_Data <- lst(N = nrow(Pred_Data)
, Y = Pred_Data$ORF_lod
, t = Pred_Data$days.from.peak
, lod = lod
, ub = rep(lod, nrow(Pred_Data))
, lb = rep(0.0, nrow(Pred_Data))
, X_peak = X_peak_pred
, K_peak = ncol(X_peak_pred)
, X_peaktime = X_peaktime_pred
, K_peaktime = ncol(X_peaktime_pred)
, X_pslope = X_pslope_pred
, K_pslope = ncol(X_pslope_pred)
, X_cslope = X_cslope_pred
, K_cslope = ncol(X_cslope_pred)
, Subj = as.integer(Pred_Data$de.id)
, nSubj = length(unique(Pred_Data$de.id))
, N_new = nrow(Pred_Data)
, t_new = Pred_Data$days.from.peak
, X_peak_new = X_peak_pred
, X_peaktime_new = X_peaktime_pred
, X_pslope_new = X_pslope_pred
, X_cslope_new = X_cslope_pred
)
Generated quantities are calculated as:
vector[N_new] Duration;
vector[N_new] Stage;
vector[N_new] epred;
for (i in 1 : N_new) {
epred[i] = mufun(t[i], peak_epred[i], peaktime_epred[i], pslope_epred[i], cslope_epred[i], lod);
}
I can’t figure out why the uncertainty is so much smaller using the stand-alone. My intention was to create the same output without the manual wrangling, but now I’m wondering which approach is more reliable.