MRP with uniform weighting across levels, despite MNAR in observed data

Summary

I have a sample that is (unfortunately) unbalanced with respect to day-of-week, week-of-year, and other calendrical variables that are also important predictors.

I would like to know if there’s a “shortcut” to creating a full historical or future calendar and then using that as newdata in a post-stratification step. I’ve tried three approaches (see below). The third might be OK if I can understand why the credible interval is so large.

Apologies if this is too off-topic for this forum. You’ve all been gracious in answering my other questions! I’d be glad to post elsewhere if this is the case.

Example

Here’s an example where the true θ varies by day-of-week, such that \text{Pr(Y=1)} increases as the week goes on. But, the probability of missingness also increases as the week goes on.

This reprex illustrates three approaches:

  1. Full calendar as counterfactual newdata
  2. Passing wts based on observed (vs known true) frequencies
  3. Using allow_new_levels = TRUE and making up a level

In the actual setting, there are day-of-week, week-of-year, hour-of-day, and other such variables, but this example only includes one.

With many such variables, #1 is too computationally intensive, #2 gets really complex and therefore prone to operator error, and #3 seems OK, but I don’t think the confidence matches what I think I want to be asking for.

Is there a simple argument I’m overlooking, that I could pass to marginaleffects::avg_predictions(), or some other interface that yields PPD draws, to get what basically corresponds to marginalizing across these “calendrical” random variables with uniform weights across the known levels? I need to be able to specify other things in the counterfactual (e.g. year = 2020, …) at the same time.

Happy to add clarification or additional details as needed. Would be grateful to be pointed to a canonical example in some relatively accessible text, along the lines of Statistical Rethinking. Thank you in advance for your time and attention!

library(tidyverse)
library(marginaleffects)
library(brms)

# Adjust to suit your preferred backend
fit_logistic <- function (...) {
  fit <- brm(..., backend = "cmdstanr", family = bernoulli())
  return(fit)
}

# Helper functions
rbernoulli <- function (n, p = 0.5) runif(n) > (1 - p)
wday_levels <- c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
fct_wday <- function (x) factor(x, levels = wday_levels)

#' Higher theta for days that are later in the week
#' But also higher probability of missing observations
wday_theta <- set_names(seq(0.3, 0.7, length.out = 7), wday_levels)
wday_pr_na <- set_names(seq(0.2, 0.8, length.out = 7), wday_levels)

# All the data we wish we had
all_data <-
  tibble(
    wday = fct_wday(wday_levels),
    theta = wday_theta[wday],
    Y = map(theta, ~ rbernoulli(1000, p = .))) %>%
  unnest(cols = "Y")

# Observed data
obs_data <-
  sample_frac(
    all_data,
    1/3,
    weight = 1 - wday_pr_na[all_data$wday])

summarise(all_data, n = n(), mean = mean(Y)) # expect 0.5
summarise(obs_data, n = n(), mean = mean(Y)) # biased low

fit <- fit_logistic(Y ~ (1|wday), data = obs_data)
summary(fit)

avg_predictions(fit) # biased low, as expected

#'
#' This approach does the job efficiently in the simple case.
#'
#' But, this approach gets expensive as the counterfactual `calendar`
#' gains more dimensions (day-of-week, week-of-year, hour-of-day, ...).
#'
calendar <- tibble(wday = fct_wday(wday_levels))
avg_predictions(fit, newdata = calendar)

#'
#' Using `wts` also works in this simplified example, but is tricky to 
#' get right for certain reasons in the non-simplified real case
#'
wday_obs <- deframe(count(obs_data, wday))
wday_wts <- sum(wday_obs) / wday_obs
avg_predictions(fit, wts = wday_wts[fit$data$wday])

#'
#' `allow_new_levels = TRUE` seems to yield the desired expectation (?)
#’ but the confint seems too big ...
#' it's not that I don't know which level we're talking about and want that uncertainty too
#'
avg_predictions(fit, allow_new_levels = TRUE, newdata = tibble(wday = "(all)"))