Struggles with Survey weighting and Regression Modeling

Hi everyone,

I hope you are doing well and safe. I’d like to comment and ask for a variant of the model proposed in the paper Struggles with Survey weighting and Regression Modeling (Gelman, 2007).
Quoting Gelman “However, the implicit weights (9) from hierarchical regression do depend on the data, implicitly, through the hyperparameters in \sum_y and \sum_β, which are estimated from the data. Thus, the appropriate weights could differ for different survey responses.”
As it can be seen in equation (10), the posterior estimates are a weighted average of the cell mean and the overall mean (partial-pooling effect).
My question is related to the fact that sometimes we have the observed means for each of the poststratified cells, instead of individual observations. If we knew the corresponding count of individuals (and within-group standard deviation), can we replicate the shrinkage effect including this information in the priors?

1 Like

Hey Juan, at least for me the question lacks easily addressable context and I’d suspect that a responder may have to go read the paper first in order to answer. If you can do more to provide context perhaps with an example more folks may be able to engage and offer help. Just IMHO.

1 Like

Hello @emiruz,

I can provide the 8-schools example with a slight modification. I am trying to understand how the number of students in each school can modify the weights and the overall mean. I might be doing mistakes cause the results are counter intuitive. I’d expect shrinkage given by sigma, but also by n_j. <- list(
n = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(10, 10, 16, 11, 9, 11, 10, 18),

fit1 <- stan(
file = “schools_example.stan”, # Stan program
data =, # named list of data
chains = 4, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
refresh = 1000 # show progress every ‘refresh’ iterations

data {
  int<lower=0> n; //number of schools
  real y[n]; // effect of coaching
  vector<lower=0>[n] sigma; // standard errors of effects
  vector<lower=0>[n] n_students;
transformed data{
parameters {
  real mu;  // the overall mean effect
  real<lower=0> tau; // the inverse variance of the effect
  vector[n] eta; // standardized school-level effects (see below)
transformed parameters {
  vector[n] theta; 
  theta = mu + tau * eta; // find theta from mu, tau, and eta
model {
  target += normal_lpdf(eta | 0, 1); // eta follows standard normal
  target += normal_lpdf(y | theta, sigma ./ n_students);  // y follows normal with mean theta and sd sigma

Very clear form :) @Max_Mantei is a slayer of these kinds of questions so I’ll kindly ask him to take a punt.

1 Like

I had a quick look. I see what you mean, no matter how I fiddle n_students, there’s no shrinkage. However, if I reduce the data from 8 to 3 records, then I see the shrinkage effect that you’re after which lead me to believe that some kind of conditioning is happening during updating.

Working with n=8, and comparing a run with n_students=sqrt(c(1000,2,4,1,1,2,2,1)) and with n_students=sqrt(c(1,2,4,1,1,2,2,1)) . I see that the value of tau is drastically higher in the former run than the latter.

Yes, but be careful, I am not sure I am approaching it in the correct way. I am not using weights in the formula and this can be the reason why we don’t get any shrinkage. Although, I’d like to replicate the shrinkage given by the number of observations as they were in the model.

Sorry, I didn’t have time to have a look at the paper yet. And believe it or not, I think I never really ran the 8 schools example. haha… I’ll try to chime in on this when I find time to read the paper for a bit, ok?