Large difference in uncertainty between manual calculation and stand-alone generated quantities

I am working with a piecewise model of viral trajectories with the following function in Stan:

 // mufun returns mean Ct:
        real mufun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
                
          // Proliferation 
         if(t <= peaktime)
         
              return(peak + pslope * t);
              
            // Clearance
            else 
            
              return(peak + (pslope * peaktime + cslope * (t - peaktime)));
            
          }

and am calculating out of sample (smoother) trajectories by increasing the number of x values (time).

When I do so by sampling the parameters and manually calculating the estimate, I get fairly broad uncertainty:

parameter_frame %>%
                select(omicron, episode, peak, peaktime, pslope, cslope,  .draw) %>%
                expand_grid(.
                            , Time = seq(-10, 18, 1)) %>%
                mutate(Pred = ifelse(Time < peaktime, peak + pslope * Time
                                     , peak + (pslope * peaktime + cslope * (Time - peaktime)))
                )


When I use the stand-alone generated quantities feature of cmdstanr, I get much narrower uncertainty:

I am creating the prediction data with the following function:

make_pred_data <- function(data, day_space){
        newcols <- c("de.id", "omicron", "episode", "ep_time_c", "days.from.peak", "ORF_lod")
        expand_grid(de.id = "1"
                    , omicron = c(unique(data$omicron))
                    , episode = c(unique(data$episode))
                    , ep_time_c = 0
                    , days.from.peak = seq(min(data$days.from.peak), max(data$days.from.peak), by = day_space)
                    , ORF_lod = lod) %>%
                rename_with(~ newcols)
        
}

and passing to Stan as:

Stan_Pred_Data <- lst(N = nrow(Pred_Data)
                  , Y = Pred_Data$ORF_lod
                  , t = Pred_Data$days.from.peak
                  , lod = lod
                  , ub = rep(lod, nrow(Pred_Data))
                  , lb = rep(0.0, nrow(Pred_Data))
                  , X_peak = X_peak_pred
                  , K_peak = ncol(X_peak_pred)
                  , X_peaktime = X_peaktime_pred
                  , K_peaktime = ncol(X_peaktime_pred)
                  , X_pslope = X_pslope_pred
                  , K_pslope = ncol(X_pslope_pred)
                  , X_cslope = X_cslope_pred
                  , K_cslope = ncol(X_cslope_pred)
                  , Subj = as.integer(Pred_Data$de.id)
                  , nSubj = length(unique(Pred_Data$de.id))
                  
                  , N_new = nrow(Pred_Data)
                  , t_new = Pred_Data$days.from.peak
                  , X_peak_new = X_peak_pred
                  , X_peaktime_new = X_peaktime_pred
                  , X_pslope_new = X_pslope_pred
                  , X_cslope_new = X_cslope_pred
                 
)

Generated quantities are calculated as:

    vector[N_new] Duration;
    vector[N_new] Stage;
    vector[N_new] epred;
    
     for (i in 1 : N_new) {
             
                epred[i] = mufun(t[i], peak_epred[i], peaktime_epred[i], pslope_epred[i], cslope_epred[i], lod);                
                
                
}

I can’t figure out why the uncertainty is so much smaller using the stand-alone. My intention was to create the same output without the manual wrangling, but now I’m wondering which approach is more reliable.

The only thing that immediately stands out is that mufun has <= while the R code has < for the time comparison.

To help reduce sources of discrepancy, I’d suggest exposing the mufun function to R and checking whether the results are still different

Thanks Andrew!

How do I go about exposing the mufun to R?

rstan: Expose user-defined Stan functions to R for testing and simulation — expose_stan_functions • rstan

cmdstanr: Expose Stan functions to R — model-method-expose_functions • cmdstanr

2 Likes

I think I’m doing something wrong in how I’m wrangling and calculating in R:

make_preds <- function(fit, x, pred_data) {
        samples <- get_peaks_cov(fit = fit,  x= x)

                matrix(fit$draws(format = "matrix"
                                 , variable = c(paste0("b_peak[", 1:length(colnames(x)), "]")))
                       , ncol=length(colnames(x)))
        
        mat = matrix(NA, nrow = nrow(x), ncol = dim(samples)[1])
        for(i in 1:dim(samples)[1]){
                
                mat[,i] <- x%*%samples[i,]
        }
        
        cbind(pred_data,
              mat) %>%
                pivot_longer(`1`:last_col()
                             , names_to = ".draw"
                             , values_to = "pred") %>%
                select(pred)
}

Something in there blows up the number of rows.

But, this standalone generated quantities feature is awesome!