Large difference in uncertainty between manual calculation and stand-alone generated quantities

JLC · June 16, 2024, 5:48pm

I am working with a piecewise model of viral trajectories with the following function in Stan:

 // mufun returns mean Ct:
        real mufun(real t, real peak, real peaktime, real pslope, real cslope, real lod){
                
          // Proliferation 
         if(t <= peaktime)
         
              return(peak + pslope * t);
              
            // Clearance
            else 
            
              return(peak + (pslope * peaktime + cslope * (t - peaktime)));
            
          }

and am calculating out of sample (smoother) trajectories by increasing the number of x values (time).

When I do so by sampling the parameters and manually calculating the estimate, I get fairly broad uncertainty:

parameter_frame %>%
                select(omicron, episode, peak, peaktime, pslope, cslope,  .draw) %>%
                expand_grid(.
                            , Time = seq(-10, 18, 1)) %>%
                mutate(Pred = ifelse(Time < peaktime, peak + pslope * Time
                                     , peak + (pslope * peaktime + cslope * (Time - peaktime)))
                )

When I use the stand-alone generated quantities feature of cmdstanr, I get much narrower uncertainty:

I am creating the prediction data with the following function:

make_pred_data <- function(data, day_space){
        newcols <- c("de.id", "omicron", "episode", "ep_time_c", "days.from.peak", "ORF_lod")
        expand_grid(de.id = "1"
                    , omicron = c(unique(data$omicron))
                    , episode = c(unique(data$episode))
                    , ep_time_c = 0
                    , days.from.peak = seq(min(data$days.from.peak), max(data$days.from.peak), by = day_space)
                    , ORF_lod = lod) %>%
                rename_with(~ newcols)
        
}

and passing to Stan as:

Stan_Pred_Data <- lst(N = nrow(Pred_Data)
                  , Y = Pred_Data$ORF_lod
                  , t = Pred_Data$days.from.peak
                  , lod = lod
                  , ub = rep(lod, nrow(Pred_Data))
                  , lb = rep(0.0, nrow(Pred_Data))
                  , X_peak = X_peak_pred
                  , K_peak = ncol(X_peak_pred)
                  , X_peaktime = X_peaktime_pred
                  , K_peaktime = ncol(X_peaktime_pred)
                  , X_pslope = X_pslope_pred
                  , K_pslope = ncol(X_pslope_pred)
                  , X_cslope = X_cslope_pred
                  , K_cslope = ncol(X_cslope_pred)
                  , Subj = as.integer(Pred_Data$de.id)
                  , nSubj = length(unique(Pred_Data$de.id))
                  
                  , N_new = nrow(Pred_Data)
                  , t_new = Pred_Data$days.from.peak
                  , X_peak_new = X_peak_pred
                  , X_peaktime_new = X_peaktime_pred
                  , X_pslope_new = X_pslope_pred
                  , X_cslope_new = X_cslope_pred
                 
)

Generated quantities are calculated as:

    vector[N_new] Duration;
    vector[N_new] Stage;
    vector[N_new] epred;
    
     for (i in 1 : N_new) {
             
                epred[i] = mufun(t[i], peak_epred[i], peaktime_epred[i], pslope_epred[i], cslope_epred[i], lod);                
                
                
}

I can’t figure out why the uncertainty is so much smaller using the stand-alone. My intention was to create the same output without the manual wrangling, but now I’m wondering which approach is more reliable.

andrjohns · June 16, 2024, 7:01pm

The only thing that immediately stands out is that mufun has <= while the R code has < for the time comparison.

To help reduce sources of discrepancy, I’d suggest exposing the mufun function to R and checking whether the results are still different

JLC · June 16, 2024, 7:21pm

Thanks Andrew!

How do I go about exposing the mufun to R?

andrjohns · June 16, 2024, 7:25pm

rstan: Expose user-defined Stan functions to R for testing and simulation — expose_stan_functions • rstan

cmdstanr: Expose Stan functions to R — model-method-expose_functions • cmdstanr

JLC · June 19, 2024, 3:54pm

I think I’m doing something wrong in how I’m wrangling and calculating in R:

make_preds <- function(fit, x, pred_data) {
        samples <- get_peaks_cov(fit = fit,  x= x)

                matrix(fit$draws(format = "matrix"
                                 , variable = c(paste0("b_peak[", 1:length(colnames(x)), "]")))
                       , ncol=length(colnames(x)))
        
        mat = matrix(NA, nrow = nrow(x), ncol = dim(samples)[1])
        for(i in 1:dim(samples)[1]){
                
                mat[,i] <- x%*%samples[i,]
        }
        
        cbind(pred_data,
              mat) %>%
                pivot_longer(`1`:last_col()
                             , names_to = ".draw"
                             , values_to = "pred") %>%
                select(pred)
}

Something in there blows up the number of rows.

But, this standalone generated quantities feature is awesome!

Topic		Replies	Views
Generate Quantities - Mismatch between model and fitted_parameters CmdStan	3	590	September 28, 2022
Standalone generated quantities usage help : cmdstanr CmdStan techniques , fitting-issues , algorithms	3	986	April 23, 2020
Prediction in generated quantities vs. model parameters General rstan , posterior-predictive	4	1034	September 15, 2020
Generated Quantities Returning NaN Modeling fitting-issues	2	929	September 29, 2021
Having trouble recovering known parameters from simulated data Modeling cmdstanpy	2	641	June 4, 2021

Large difference in uncertainty between manual calculation and stand-alone generated quantities

Related topics