Survey weighted regression

bgoodri · January 11, 2019, 4:04pm

The stan_glm and stan_lm functions do things a little differently. For most models, they are interpreted as frequency weights. For stan_lm, they can be like the weights for generalized least squares.

realkrantz · January 11, 2019, 4:38pm

@Guido_Biele. Thanks for this very helpful approach. I have two questions:

(1) How could we make the relationship between the weight and the variance explicit in the model below?

The reason is that we have two types of inverse variance weights:

weight_RandomEffects = 1/(tau^2 + sigma^2)
weight_FixedEffects = 1/sigma^2

(2) Could it be correct to implement directly in a stan program the relevant equations from here, here, here, or elsewhere by adding a transformed parameters block to the model below?

For example,

transformed parameters {
...
vector[N] tau;  
tau = sqrt(sum(w^2[i][(y[i] - y_mu)^2 - sigma^2[i]])/sum(w^2[i]))
...
}

Guido_Biele:

the following model estimates the effect size such that the study weight depends on the studies effect sizes and associated standard deviations.
data {
  int N;            // number of studies
  vector[N] y;      // study effect sizes
  vector[N] sigmas; // standard deviations of study effect sizes
}
parameters {
  real mu;
}
model {
  mu ~ normal(0,2);
  y ~ normal(mu,sigmas);
}

Thanks in advance.

Guido_Biele · January 12, 2019, 3:30pm

I can look a bit more into this on Monday.
Generally, you can use weights in Stan as you can do it in when computing maximum likelihood estimates. (Though not everyone agrees one should)
The reason I’m am hesitant in my answer is, that I am unsure about what the sigma is in the inverse variance weights. I assume it’s the sigma of the effect sizes, and I mm addition one estimates an error variance, but I can’t tell without access to the paper. (I am also not sure what tau is).

As an aside, you can use brms to estimate meta analysis models. See for example here: https://vuorre.netlify.com/post/2017/01/19/better-forest-plots-from-meta-analytic-models-estimated-with-brms/

realkrantz · January 12, 2019, 4:14pm

Great, @Guido_Biele. Here, we have two sources of variability in the effect sizes of the primary studies:

tau^2 = between-studies variance
sigma^2 = within-study variance

tau can be calculated using the formula shown above in transformed parameters block.

sigma is calculated as follows:

These variances are then used to estimate the inverse variance weights:

Thanks in advance.

dkaplan · April 15, 2019, 9:19pm

Hi Bob,

In a 2017 post, you write

The closest the manual comes is a section on “Exploiting sufficient statistics”. I haven’t added anyting on weighted regression since our regression experts, Ben Goodrich and Andrew Gelman, don’t like the weightings (other than those based on sufficient stats) because the resulting model isn’t properly Bayesian in that there’s no generative process for the weights.

This makes very good sense to me and was just wondering if you could point to a specific paper where Goodrich and Gelman spell out these issues.

Thanks in advance,

David

bgoodri · April 15, 2019, 11:04pm

That would be Gelman (2007)

see also the discussion, rejoinder, and citations for that issue.

dkaplan · April 16, 2019, 12:38am

Thanks!

David

dhanur88 · April 6, 2021, 3:51am

Hi,
I have fitted following logistic regression model with survey weights. I was able to run this model without any error. But it was very slow and it had all sort of warnings including divergent transitions, large R-hat values etc.

Is there any suggestions to improve the model like using different prior settings?

data {
  int<lower=1> N;
  int<lower=0,upper=1> y1[N];
  int<lower=1> K1; 
  matrix[N,K1] x1;
   vector<lower=0>[N] weights; 
}

parameters {
  real alpha1;
  vector[K1] beta;
   
}


model {
  
  vector[N] mu = alpha1+ x1 * beta;
  
  //priors
  beta ~ normal(0, 10);
  alpha1 ~ normal(0, 10);
   
  
     for (i in 1:N){
     target += weights[i] * bernoulli_logit_lpmf(y1[i] | mu[i]);;
     
       
     }
  
}

The warnings I got are as follows:

Thank you!!

mrwilli · April 6, 2021, 1:22pm

Here are a couple ideas. First check that the weights are in the data{} block. I’d also suggest trying all weights equal to 1 to see if it works and compare to a “standard” unweighted logistic regression. If that all works, then make sure the weights you use sum to the sample size and not the population size. In other words, have the average weight value be 1. If you forgive the self-promotion, check out this package we are working on (if you are using R) GitHub - RyanHornby/csSampling

dhanur88 · April 6, 2021, 1:27pm

Thank you I will look in to the package you suggested. I included the weights in my original code but forgot to include here.

dkaplan · May 24, 2022, 3:58pm

Hi Ben,

I wanted to return to this comment and one you made similarly in a response to my post many moons ago. I tend to agree that using survey weights seems a violation of the likelihood principle. Similar arguments about violation of the likelihood principle have been leveled agains the use of the p-value, but in any case, it seems that Bayesians should be consistent about this. Having said that, are you aware of any references that explicitly raised the issue of the likelihood principle in the context of sampling weights - namely, as you say, one is conditioning on data that was never observed.

Thanks

David

mrwilli · May 25, 2022, 5:40pm

I realize the question was directed at Ben, but I think the most elegant fully Bayesian approach is jointly modelling the selection probabilities and the outcome of interest. These recent papers show you can do it for glm type models, but you have to use custom stan code. I’m pretty sure the authors would be happy to share.

Luis G. León-Novelo. Terrance D. Savitsky. “Fully Bayesian estimation under informative sampling.” Electron. J. Statist. 13 (1) 1608 - 1645, 2019. Fully Bayesian estimation under informative sampling

Guido_Biele · May 26, 2022, 7:36am

@mrwilli: Do you know if the surhors address the issue of model feedback? AFAIK one cannot set up a fully bayesian model that jointly estimates weights and exposure outcome associations without bias.
See e.g. the discussion here: On Bayesian estimation of marginal structural models - PMC or here Model feedback in Bayesian propensity score estimation - PubMed

(I am ignoring the issue of weights not being Bayesians. There are some types of selection bias that MRP/standardization can’t deal with and that can only be dealt with with weights)

mrwilli · May 27, 2022, 1:23am

Good question. I think it’s different for survey weights because they are observed, so this framework treats them as data and smooths them. Whereas for propensity weights they have to be estimated from binary indicators. So I have not seen this issue of model feedback for comodelling survey weights.

That being said we just worked on using a survey sample to come up with propensity weights for a convenience sample and I now wonder if we should check for this issue.

Guido_Biele · May 27, 2022, 6:41am

I agree that model feedback is not the problem if one tries to estimate the expected response in a population, like in opinion polls.
I was asking because the paper used the association between an exposure and an outcome as example. For such analyses model feedback os a problem.

Model feedback introduces bias because in a joint weight and outcome model, weight parameters that maximize the (weighted) likelihood of the outcome are preferred. As a result, parameters of the weight and outcome model will be biased.

Topic		Replies	Views
Weighted Beta-Binomial Bayesian model Modeling rstan , specification	8	1188	September 11, 2023
What are the "weights" in rstanarm rstanarm	20	2242	November 21, 2020
Bayesian parallels of weighted regression Modeling	12	4107	July 29, 2021
Correlation between questionnaire measurements and parameters estimated in regression Modeling	10	1173	September 17, 2017
Sampling weights in Stan General	3	2065	April 16, 2019

Survey weighted regression

Related topics