In analyzing survey data, where some demographic groups are oversampled and average predictive comparisons for the predictors are of interest because a logit or a beta regression model was used, should weighting as in MRP be used? The sampling strata were included in the model.
In addition there are survey data from two years, and this oversampling was not present in the first year.
* Operating System: Windows 10
* rstanarm Version: 2.21
Yes, I would say MRP would be a reasonable way to model this data. The lack of oversampling in the first year shouldn’t be an issue (in fact, it could make your life easier). But if the representation changed drastically between years, there could be fear that the non-response structure changed beyond the demographic variables you can adjust for… (but that’s not a thing any statistical package will help you with)
Best of luck with your model!
I appreciate your reply, I can use the help! If I am looking at just the second year which is oversampled, and I want an average predictive comparison for the predictors averaged over all the groups, do I need weighting then, or is it enough that I have strata in the model?
I am not sure I understand the question well (I also should have noted earlier that I am not expert on surveys - I am just relaying second-hand knowledge). The way I understand MRP is that you fit a model that let’s you make prediction how a single person in each stratum you can distinguish would respond. Then you generate the right number of predictions for each stratum based on your demographic data. I don’t think weighting enters the model directly. But I would feel more comfortable if @lauren checked my reasoning here (as she’s the resident survey expert).
Because I want to average over the values given by respondents for the other continuous predictors, I am using posterior_epred and I am giving it as the new data the dataset or a subset of the dataset. Therefore the oversampled groups are overrepresented in the new data. I didn’t know if including the sampling strata in the model took care of that or not.
I think I understand - for poststratification you would want to pass as new data for prediction a dataset that mimics your expected population structure. I.e. a dataset with the “correct” number of respondents in each group. Does that make sense?
Yes, that does make sense. I could sample from the overrepresented groups in the data to get the correct proportions