Data for Bayesian hierarchical weighting adjustment and survey inference

Hi @jonah @rtrangucci -

I am going to use your hierarchical weighting method from your 2017 paper for an online survey project to do post-stratified inference. While the paper is pretty clear, the example code shown in the appendix isn’t as useful as it could be because the data referenced in the code isn’t available from the paper.

Do y’all have that data? It’s apparently from the ACS. I could dig up ACS data myself but I wanted to use the same dataset y’all used in the paper.

Also if you have your simulation code available, that would be really helpful too.

I’m posting on here as I assume others might be interested in being able to run the full example code. Also there is a function in the code, mrp_structured, that is referenced in rstanarm but that I can’t find in rstanarm on Github or CRAN.

As a reference, code cut and pasted from the appendix is below:

model_based_cell_weights <- function(object, cell_table) {
  stopifnot(
    is.data.frame(cell_table),
    colnames(cell_table) == c("N", "n")
  )
  draws <- as.matrix(object)
  Sigma <- draws[, grep("^Sigma\\[", colnames(draws)), drop = FALSE]
  sigma_theta_sq <- rowSums(Sigma)
  sigma_y_sq <- draws[, "sigma"]^2
  Ns <- cell_table[["N"]] # population cell counts
  ns <- cell_table[["n"]] # sample cell counts
  J <- nrow(cell_table)
  N <- sum(Ns)
  n <- sum(ns)
  # implementing equation 7 in the paper (although i did some algebra first to
  # simplify the expression a bit)
  Nsy2 <- N * sigma_y_sq
  ww <- matrix(NA, nrow = nrow(draws), ncol = J)
  for (j in 1:J) {
    ww[, j] <-
      (Nsy2 + n * Ns[j] * sigma_theta_sq) / (Nsy2 + N * ns[j] * sigma_theta_sq)
  }
  return(ww)
}
# prepare population data: acs_ad has age, eth, edu and inc
acs_ad %>%
  mutate(
    cell_id = paste0(age, eth, edu, inc)
  ) -> acs_ad
acs_design <- svydesign(id = ~1, weights = ~perwt, data = acs_ad)
agg_pop <-
  svytable( ~ age + eth + edu + inc, acs_design) %>%
  as.data.frame() %>%
  rename(N = Freq) %>%
  mutate(
    cell_id = paste0(age, eth, edu, inc)
  ) %>%
  filter(cell_id %in% acs_ad$cell_id)
# prepare data to pass to rstanarm
# SURVEYdata has 4 weighting variables: age, eth, edu and inc; and outcome Y
dat_rstanarm <-
  SURVEYdata %>%
  mutate(
    cell_id = paste0(age, eth, edu, inc)
  )%>%
  group_by(age, eth, edu, inc) %>%
  summarise(
    sd_cell = sd(Y),
    n = n(),
    Y = mean(Y),
    cell_id = first(cell_id)
  ) %>%
  mutate(sd_cell = if_else(is.na(sd_cell), 0, sd_cell)) %>%
  left_join(agg_pop[, c("cell_id", "N")], by = "cell_id")
# Stan fitting under structured prior in rstanarm
fit <-
  stan_glmer(
    formula =
      Y ~ 1 + (1 | age) + (1 | eth) + (1 | edu) + (1 | inc) +
      (1 | age:eth) + (1 | age:edu) + (1 | age:inc) +
      (1 | eth:edu) + (1 | eth:inc) +
      (1 | age:eth:edu) + (1 | age:eth:inc),
    data = dat_rstanarm, iter = 1000, chains = 4, cores = 4,
    prior_covariance =
      rstanarm::mrp_structured(
        cell_size = dat_rstanarm$n,
        cell_sd = dat_rstanarm$sd_cell,
        group_level_scale = 1,
        group_level_df = 1
      ),
    seed = 123,
    prior_aux = cauchy(0, 5),
    prior_intercept = normal(0, 100, autoscale = FALSE),
    adapt_delta = 0.99
  )
# model-based weighting
cell_table <- fit$data[,c("N","n")]
weights <- model_based_cell_weights(fit, cell_table)
weights <- data.frame(w_unit = colMeans(weights),
                      cell_id = fit$data[["cell_id"]],
                      Y = fit$data[["Y"]],
                      n = fit$data[["n"]]) %>%
  mutate(
    w = w_unit / sum(n / sum(n) * w_unit), # model-based weights
    Y_w = Y * w
  )
with(weights, sum(n * Y_w / sum(n)))# mean estimate

Thanks much for writing the method and for your help!

Bob

2 Likes

I have nothing to offer regarding the code, but you might want to check this thread for information on mrp_structured(): https://discourse.mc-stan.org/t/fitting-a-large-data-hierarchical-model-can-i-trust-variational-inference-algorithm/1780.

Specifically, @bgoodri said (in '17) that it’s available in one of rstanarm’s branches and that it can be installed via:

devtools::install_github("stan-dev/rstanarm", ref = "structured_prior_merge",
                         args = "--preclean", build_vignettes = FALSE)

I do have the data but unfortunately I don’t think I’m allowed to share it at the moment. The ACS stuff is public but most of the data required to run our code is from the New York City Longitudinal Study of Wellbeing and I don’t think they have made that data public yet (they definitely will at some point). @lauren do you by any chance have any idea what the timeline is for that?

And yeah it looks like the arxiv version of the paper needs to be updated to mention that the rstanarm version required is just on a branch. If we publish the paper then we’ll likely try to get it implemented in an rstanarm release at some point.

@Jonah can you link the paper so I can check what years you used so I can check?

Hello, have there been any developments on this thread? I am also interested in reproducing the analysis from this paper (if possible)…
Thanks!

1 Like

What do we want? Structured multilevel regression & post-stratification.

Where do we want it? It’d be nice to have it out in the rstanarm repo.

@jonah @lauren

1 Like

Hey @lauren, I’m pretty sure this is the paper:

I’m interested in access to the data, too.

1 Like

Hi, I am trying to install this branch using code

devtools::install_github("stan-dev/rstanarm", ref = "structured_prior_merge",
                         args = "--preclean", build_vignettes = FALSE)

Unfortunately, it failed

if (!require(devtools)) {
  install.packages("devtools")
  library(devtools)
}
#> Loading required package: devtools
#> Loading required package: usethis
install_github("stan-dev/rstanarm", ref = "structured_prior_merge", args = "--preclean", build_vignettes = FALSE)
#> Downloading GitHub repo stan-dev/rstanarm@structured_prior_merge
#> 
#>          checking for file 'C:\Users\liliz\AppData\Local\Temp\RtmpUN5WGe\remotes80448383394\stan-dev-rstanarm-b46c14f/DESCRIPTION' ...  v  checking for file 'C:\Users\liliz\AppData\Local\Temp\RtmpUN5WGe\remotes80448383394\stan-dev-rstanarm-b46c14f/DESCRIPTION'
#>       -  preparing 'rstanarm':
#>    checking DESCRIPTION meta-information ...     checking DESCRIPTION meta-information ...   v  checking DESCRIPTION meta-information
#> -  cleaning src
#>       -  running 'cleanup.win'
#>       -  checking for LF line-endings in source and make files and shell scripts (2.7s)
#>       -  checking for empty or unneeded directories
#>       -  building 'rstanarm_2.15.4.tar.gz'
#>   Warning:     Warning: file 'rstanarm/cleanup' did not have execute permissions: corrected
#>      
#> 
#> Installing package into 'C:/Users/liliz/OneDrive/Documents/R/win-library/4.0'
#> (as 'lib' is unspecified)
#> Warning in i.p(...): installation of package 'C:/Users/liliz/AppData/Local/Temp/
#> RtmpUN5WGe/file8041ce74daa/rstanarm_2.15.4.tar.gz' had non-zero exit status
#> Warning: 1 components of `...` were not used.
#> 
#> We detected these problematic arguments:
#> * `args`
#> 
#> Did you misspecify an argument?

Do you have any suggestions? Thanks!

I think the developers of the install_github function have changed its arguments and it no longer has an args argument. It looks like the ... argument to install_github is passed to install.packages so maybe instead of args = "--preclean" you could try INSTALL_opts = "--preclean" and see if that works, but I’m not entirely sure.

Also just a heads up: in theory this rstanarm branch should still work, but it’s quite old at this point, was always “experimental”, and hasn’t been maintained for a while so no guarantees unfortunately.

1 Like

Thank you for your quick response. I tried again, but still failed.

devtools::install_github("stan-dev/rstanarm", ref = "structured_prior_merge", INSTALL_opts = "--preclean", build_vignettes = FALSE)
#> Downloading GitHub repo stan-dev/rstanarm@structured_prior_merge
#> 
#>          checking for file 'C:\Users\liliz\AppData\Local\Temp\Rtmp8GTV3T\remotes62084cab602e\stan-dev-rstanarm-b46c14f/DESCRIPTION' ...  v  checking for file 'C:\Users\liliz\AppData\Local\Temp\Rtmp8GTV3T\remotes62084cab602e\stan-dev-rstanarm-b46c14f/DESCRIPTION'
#>       -  preparing 'rstanarm':
#>    checking DESCRIPTION meta-information ...     checking DESCRIPTION meta-information ...   v  checking DESCRIPTION meta-information
#> -  cleaning src
#>       -  running 'cleanup.win'
#>       -  checking for LF line-endings in source and make files and shell scripts (2.7s)
#>       -  checking for empty or unneeded directories
#>       -  building 'rstanarm_2.15.4.tar.gz'
#>   Warning:     Warning: file 'rstanarm/cleanup' did not have execute permissions: corrected
#>      
#> 
#> Installing package into 'C:/Users/liliz/OneDrive/Documents/R/win-library/4.0'
#> (as 'lib' is unspecified)
#> Warning in i.p(...): installation of package 'C:/Users/liliz/AppData/Local/Temp/
#> Rtmp8GTV3T/file62084a7643ab/rstanarm_2.15.4.tar.gz' had non-zero exit status

Does this imply that there is a problem with the rstanarm branch?

Hmm, I’m not sure. Can you try this and see if it changes anything?

Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")

remotes::install_github("stan-dev/rstanarm", ref = "structured_prior_merge", INSTALL_opts = "--preclean", build_vignettes = FALSE)

Still same. There is ERROR: compilation failed for package ‘rstanarm’.

Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")

remotes::install_github("stan-dev/rstanarm", ref = "structured_prior_merge", INSTALL_opts = "--preclean", build_vignettes = FALSE)
#> Downloading GitHub repo stan-dev/rstanarm@structured_prior_merge
#> 
#>          checking for file 'C:\Users\liliz\AppData\Local\Temp\Rtmpig5MV0\remotes2d604db4909\stan-dev-rstanarm-b46c14f/DESCRIPTION' ...  v  checking for file 'C:\Users\liliz\AppData\Local\Temp\Rtmpig5MV0\remotes2d604db4909\stan-dev-rstanarm-b46c14f/DESCRIPTION'
#>       -  preparing 'rstanarm':
#>    checking DESCRIPTION meta-information ...     checking DESCRIPTION meta-information ...   v  checking DESCRIPTION meta-information
#> -  cleaning src
#>       -  running 'cleanup.win'
#>       -  checking for LF line-endings in source and make files and shell scripts (2.5s)
#>       -  checking for empty or unneeded directories
#>       -  building 'rstanarm_2.15.4.tar.gz'
#>   Warning:     Warning: file 'rstanarm/cleanup' did not have execute permissions: corrected
#>      
#> 
#> Installing package into 'C:/Users/liliz/OneDrive/Documents/R/win-library/4.0'
#> (as 'lib' is unspecified)
#> Warning in i.p(...): installation of package 'C:/Users/liliz/AppData/Local/Temp/
#> Rtmpig5MV0/file2d6053991a97/rstanarm_2.15.4.tar.gz' had non-zero exit status
make: *** [C:/PROGRA~1/R/R-40~1.5/etc/x64/Makeconf:229: Modules.o] Error 1
ERROR: compilation failed for package 'rstanarm'
* removing 'C:/Users/liliz/OneDrive/Documents/R/win-library/4.0/rstanarm'
Warning message:
In i.p(...) :
  installation of package ‘C:/Users/liliz/AppData/Local/Temp/RtmpQZ7rjs/file48784f883b9a/rstanarm_2.15.4.tar.gz’ had non-zero exit status

Is there anyway you can share the R function of

rstanarm::mrp_structured

as I am really interested in this structured prior distributions. Thanks in advance!!

1 Like

Just to bump this conversation, I too am trying to replicate the weighting used in “Bayesian hierarchical weighting adjustment and survey inference” and am unable to install the structured_prior_merge branch on my machine. I’ve tried to lift the relevant parts from the GitHub repo (see, e.g., line 526 here: rstanarm/priors.R at structured_prior_merge · stan-dev/rstanarm · GitHub) but have hit errors along the way. In my most recent attempt, I locally added the functions of validate_parameter_value() and mrp_structured() but then get the following error when attempting to run the model as it appears in the paper’s appendix (Section 6. Discussion):

Error in check_stanfit(stanfit) : 
  Invalid stanfit object produced please report bug
In addition: Warning message:
In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug
Error in dimnamesGets(x, value) : 
  invalid dimnames given for “dgCMatrix” object
1 Like

Did you have any progress on this @apod? Have been trying as well, same process as you but a different issue though.