For a project I want to calculate an election forecast. I run a survey and collected data (sampling process was highly selective, so data is biased). I also have access to a census-dataset, so I know really well about the real population (the electorate). Now I want to do a MrP estimation in R relying on my sample (the survey) and the census.
I already coded something (see below) but I’m unsure whether this is the best and only way to go…
My questions are:
- How is it in practice possible to get estimates for a categorical variable like party preference applying MrP? I usually found on the internet examples where there was a bivariate variable like yes/no. In my use case, I have multiple parties to estimate.
- Is the coding below appropriate and useful for my use case? With this code I run MrP for each party (e.g. the German “union” party in the example). So I get for all 7 parties estimates - problematic is that by summing the party-estimates for my final joint estimation I am below or above 100%. How can I get at once the estimate for all parties with MrP, so I get as total value 1 (100%)?
- How is MrP when fitting the model with variables with NA values in the sample, which can’t matched to the census? Are they simply weighted as 1?
- Running the code for my whole dataset (n=10.000) takes really long (more than 10 minutes) in my R despite I have quite good hardware. Why is this so?
Here the actual code for MrP party estimation for “union” party. (As described above I’m not convinced of this way, instead I would like to calculate the vote share for all parties in one MrP model.)
# create model: estimate union vote share by gender, age group, last voting decision (party_2017) and state fit_model_sample_union <- stan_glmer(union_vote ~ 1 + (1|gender) + (1|agegroup) + (1|party_2017) + (1|state), family = binomial(link = "logit"), data = sample_mrp_union, prior = normal(0, 1, autoscale = TRUE), prior_covariance = decov(scale = 0.50), adapt_delta = 0.99, refresh = 0, seed = 111) print(fit_model_sample_union) # calculate MRP estimate mean and sd epred_mat <- posterior_epred(fit_model_sample_union, newdata = census_mrp, draws = 1000) mrp_estimates_vector <- epred_mat %*% ((census_mrp$votes_valid_party / sum(census_mrp$votes_valid_party))) mrp_estimate <- c(mean = mean(mrp_estimates_vector), sd = sd(mrp_estimates_vector)) cat("MRP estimate mean, sd: ", round(mrp_estimate, 5))
Looking forward to your ideas / help :-)