Handling missing data in zero-inflated models because data doesn't exist

jarroyoe · April 30, 2024, 4:02pm

I have a dataset where I am looking at how many customers a store receives per hour (Customers). When there are customers, I am also interested in understanding how much each individual customer has spent. There are some hours where there are no customers, which I am trying to study using a hurdle model. However, this means that in those cases there is no money spent, so the column Spent would be NA. My data looks something like this:

Hour Spent Customers
1       20         1
1       10         2
2       NA         0
3       NA         0
4       10         2
5       15         1
6       10         3
7       NA         0

I’m trying to fit a model of this type:

model <- brms(bf(Customers ~ Hour + Spent,
                 hu ~ Hour + Spent),
                 data = df,
                 family = hurdle_poisson())

However I’m not sure how to handle these NAs. Technically they are not missing data as there was no data to collect at that time, so I’m not sure if imputation or handling it with the mi function are appropriate ways to handle it.

I would appreciate any suggestions that may come up.

Bob_Carpenter · April 30, 2024, 4:42pm

It looks like your model’s trying to predict customers based on hour and amount spent.

I doubt that hour is going to be a linear effect if this is store hours, so I’d suggest using an hour random effect rather than a fixed effect. I think that’s 1 | Hour but I’m not 100% sure not being a brms or lme4 user.

I don’t know brms, but my guess is that you want to change those NA to 0. The dollar amount spent isn’t unknown, it’s zero. Is it possible there is 1 or more customers and 0 dollars spent? If it were NA, it might be 0, 10, 20, or some other value.

Assuming you have a bunch of days, I’d suggest plotting the number of customers for each hour as a bar plot (multiple plots, one per hour) and seeing if they’re consistent with a Poisson distribution or if they need zero inflation of some kind (hurdle or direct).

Topic		Replies	Views
Help! Count Data Model Modeling specification , example-models	1	521	August 25, 2023
Zero Inflated models with missing values Modeling missing-data	4	730	March 27, 2019
Need help understanding hurdle (hurdle_gamma) models using brms brms	2	1395	July 19, 2020
Imputing New Poisson Response Variables in Model (brms) brms missing-data	6	1425	November 22, 2021
Modelling data with threshold limit in brms (hurdle, censoring?) Modeling techniques , hierarchical-model , brms , missing-data	3	864	April 15, 2023

Handling missing data in zero-inflated models because data doesn't exist

Related topics