Struggles with Survey weighting and Regression Modeling

Juan_Ignacio_de_Oyarbide · March 19, 2020, 6:07pm

Hi everyone,

I hope you are doing well and safe. I’d like to comment and ask for a variant of the model proposed in the paper Struggles with Survey weighting and Regression Modeling (Gelman, 2007).
Quoting Gelman “However, the implicit weights (9) from hierarchical regression do depend on the data, implicitly, through the hyperparameters in \sum_y and \sum_β, which are estimated from the data. Thus, the appropriate weights could differ for different survey responses.”
As it can be seen in equation (10), the posterior estimates are a weighted average of the cell mean and the overall mean (partial-pooling effect).
My question is related to the fact that sometimes we have the observed means for each of the poststratified cells, instead of individual observations. If we knew the corresponding count of individuals (and within-group standard deviation), can we replicate the shrinkage effect including this information in the priors?

emiruz · March 23, 2020, 11:48am

Hey Juan, at least for me the question lacks easily addressable context and I’d suspect that a responder may have to go read the paper first in order to answer. If you can do more to provide context perhaps with an example more folks may be able to engage and offer help. Just IMHO.

Juan_Ignacio_de_Oyarbide · March 23, 2020, 1:59pm

Hello @emiruz,

I can provide the 8-schools example with a slight modification. I am trying to understand how the number of students in each school can modify the weights and the overall mean. I might be doing mistakes cause the results are counter intuitive. I’d expect shrinkage given by sigma, but also by n_j.

schools.data <- list(
n = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(10, 10, 16, 11, 9, 11, 10, 18),
n_students=sqrt(c(1000,2,4,1,1,2,2,1))
)

fit1 <- stan(
file = “schools_example.stan”, # Stan program
data = schools.data, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
refresh = 1000 # show progress every ‘refresh’ iterations
)

data {
  int<lower=0> n; //number of schools
  real y[n]; // effect of coaching
  vector<lower=0>[n] sigma; // standard errors of effects
  vector<lower=0>[n] n_students;
}
transformed data{
}
parameters {
  real mu;  // the overall mean effect
  real<lower=0> tau; // the inverse variance of the effect
  vector[n] eta; // standardized school-level effects (see below)
}
transformed parameters {
  vector[n] theta; 
  theta = mu + tau * eta; // find theta from mu, tau, and eta
}
model {
  target += normal_lpdf(eta | 0, 1); // eta follows standard normal
  target += normal_lpdf(y | theta, sigma ./ n_students);  // y follows normal with mean theta and sd sigma
}

emiruz · March 23, 2020, 2:49pm

Very clear form :) @Max_Mantei is a slayer of these kinds of questions so I’ll kindly ask him to take a punt.

emiruz · March 24, 2020, 7:40pm

I had a quick look. I see what you mean, no matter how I fiddle n_students, there’s no shrinkage. However, if I reduce the data from 8 to 3 records, then I see the shrinkage effect that you’re after which lead me to believe that some kind of conditioning is happening during updating.

Working with n=8, and comparing a run with n_students=sqrt(c(1000,2,4,1,1,2,2,1)) and with n_students=sqrt(c(1,2,4,1,1,2,2,1)) . I see that the value of tau is drastically higher in the former run than the latter.

Juan_Ignacio_de_Oyarbide · March 24, 2020, 8:39pm

Yes, but be careful, I am not sure I am approaching it in the correct way. I am not using weights in the formula and this can be the reason why we don’t get any shrinkage. Although, I’d like to replicate the shrinkage given by the number of observations as they were in the model.

Max_Mantei · March 24, 2020, 8:48pm

Sorry, I didn’t have time to have a look at the paper yet. And believe it or not, I think I never really ran the 8 schools example. haha… I’ll try to chime in on this when I find time to read the paper for a bit, ok?

Topic		Replies	Views
Survey weighted regression Modeling	34	8973	May 27, 2022
Data for Bayesian hierarchical weighting adjustment and survey inference General	14	1887	February 24, 2024
Adjusting for survey weighting in Stan General surveys	2	600	December 5, 2020
Sampling weights in Stan General	3	2026	April 16, 2019
Survey weights in brms/stan - Simulation based on design effect, feedback sought! Modeling techniques	12	2473	September 9, 2022

Struggles with Survey weighting and Regression Modeling

Related topics