Stan does not support NA - Multivariate model with missing values

Dear Stan community,

I am fitting a multivariate model with multiple variable missing values and I am facing with an error message like this:

Error in FUN(X[[i]], …) :
Stan does not support NA (in X_planktonabunz) in data
failed to preprocess the data; sampling not done


Does @Bob_Carpenter’s answer in 2018 still hold ? Any advice is welcome in regards to this specific case.

Load data

m2_sampled.csv (44.5 KB)

m2 <- read.csv(here("data","m2_sampled.csv"), row.names = NULL) |> 
  
  mutate_at( c( "month"), factor)

Model formula

# formula 
bform <- 
  # the outcome 
  bf(delta_DOC ~ 0 + Intercept + DOC_input_z + mi(water_res_time_z) + mi(cover_litter_z) + mi(plankton_abun_z) + mi(macrophy_abun_z)  ) + gaussian() +
   # Amount of DOC upstream the beaver pond
  
  bf(DOC_input_z  ~ 0 + Intercept + month + slope_z) + gaussian() +

  # solar radiation in the beaver pond
  
  bf(solar_z | mi()  ~ 0 + Intercept + month ) + gaussian() + 
  
  # Primary producers diversity and abundance

  bf(plankton_abun_z  | mi() ~ 0 + Intercept + solar_z + mi(macrophy_abun_z) + water_volume_z + DOC_input_z) + gaussian() +
  
  bf(macrophy_abun_z  | mi() ~ 0 + Intercept + solar_z  + water_volume_z + DOC_input_z) + gaussian() +
  
  bf(cover_litter_z | mi() ~ 0 + Intercept + solar_z) + gaussian() +
  
  # Beaver's pond characteristics
  bf(height_diff_z | mi()  ~ 0 + Intercept + slope_z ) + gaussian() +
  
  bf(dam_persistence ~ 0 + Intercept + slope_z)  + poisson() + # time period (years) representing how long the beaver dam is present 
  
  bf(n_dams ~ 0 + Intercept + slope_z ) + poisson() + # number of beaver dams that are found in the stream 

  # HIDROLOGY 
  
  bf(water_res_time_z | mi()  ~ 0 + Intercept  + water_volume_z ) + gaussian() +
  
  bf(water_volume_z  ~ 0 + Intercept + n_dams + dam_persistence + mi(height_diff_z) + slope_z  ) + gaussian() +
  
  set_rescor(FALSE)

Set priors


pr  <- c(    
  # define prior on all Intercept terms for continuos variable at once Note that we define priors on standardized predictors.
  set_prior("normal(0, 1.5)", class = "b", coef = "Intercept",  resp = c("coverlitterz" , "DOCinputz" , "heightdiffz", "macrophyabunz" , "planktonabunz", "solarz", "waterrestimez", "watervolumez")  ),
 
   # define  prior on categorical predictors
  set_prior("normal(0, 10)", class = "b", coef = "Intercept",  resp =  c("dampersistence", "ndams" )  ),
   
  # define a prior on the response variable delta DOC 
   set_prior("normal(0, 3)", class = "b", coef = "Intercept",  resp =  "deltaDOC"   ),
  
  # define population level effect (slopes ) of continuos numerical predictors 
  set_prior("normal(0, .5)", class = "b",  resp = c("coverlitterz", "DOCinputz" , "heightdiffz", "macrophyabunz" , "planktonabunz", "solarz", "waterrestimez", "watervolumez")  ),
  
# define population level effect (slopes ) of categorical predictors 
   set_prior("normal(0, 0.5)", class = "b",  resp =  c("dampersistence", "ndams" )),

# define population level effect (slopes ) of response variable outcome delta DOC   
   set_prior("normal(0, 0.5)", class = "b",  resp =  "deltaDOC"   )
  )

Validate priors


validate_prior(prior = pr,
               formula = bform,
               data = m2)

Fit the model

#Prior predictive check with sample_prior. # This model is used below to perform a prior predictive check

b0 <- brm(bform,
        data = m2, 
        prior = pr,
        iter = 2000, 
        warmup = 500, 
        chains = 4,
        cores = 4, 
         backend = "cmdstanr",
        seed = 7, # allow exact replication of results 
        sample_prior = "only"
        )

Yes, it still holds. Stan will not accept NAs.

Hey @leomarameo7,
the reason you get that error is that you forgot to use mi() for solar_z as a predictor.

bf(plankton_abun_z  | mi() ~ 0 + Intercept + mi(solar_z) + mi(macrophy_abun_z) + water_volume_z + DOC_input_z) + gaussian()

instead of

bf(plankton_abun_z  | mi() ~ 0 + Intercept + solar_z + mi(macrophy_abun_z) + water_volume_z + DOC_input_z) + gaussian()

and so on for the next two formulas.

1 Like