Survey weighted regression

#8

@Guido_Biele I’m trying to implement this on a very simple model. Your code chunk says yHat[n] but I’m guessing that is a typo and it should be i instead of n. I tried making that change and i get the following error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
  error in 'model_no_groups_with_weights' at line 21, column 35
  -------------------------------------------------
    19:   }
    20:   for(i in 1:N){
    21:     target +=  normal_lpdf(yHat[i], sigma) * weights[i];
                                          ^
    22:   }
  -------------------------------------------------

PARSER EXPECTED: "|"

Then I replaced the , for a | as indicated in the error message, but I got a different error:


  error in 'model_no_groups_with_weights' at line 21, column 42
  -------------------------------------------------
    19:   }
    20:   for(i in 1:N){
    21:     target +=  normal_lpdf(yHat[i]|sigma) * weights[i];
                                                 ^
    22:   }
  -------------------------------------------------

In case is helpful, this is my full stan code:

data {
  int<lower=0> N;    // number of data items
  int<lower=0> K;    // number of predictors
  vector[N] y;       // outcome vector
  vector[K] x[N];    // predictor matrix
  vector<lower=0>[N] weights;  // model weights 
}

parameters {
  real alpha;                   // intercept
  vector[K] beta;               // coefficients for predictors
  real<lower=0> sigma;          // error sd
}

model {
  real yHat[N];
  for(i in 1:N){
    yHat[i] = alpha + dot_product(x[i], beta); 
  }
  for(i in 1:N){
    target +=  normal_lpdf(yHat[i]|sigma) * weights[i];
  }
  beta ~ normal(0,1); 
}

What am I doing wrong?

Thanks again!

#9

Try

  for(i in 1:N){
    target +=  normal_lpdf(y[i] | yHat[i], sigma) * weights[i];
  }
1 Like
#10

Really @Guido-Biele’s post had the solution with a typo.

#11

Hi,

(First-time poster, long-time user, and big-time appreciate-er of Stan. Thanks so much for all you guys do!)

Is the above suggestion equivalent to a standard inverse-variance weighting approach?

 for(i in 1:N){
    target +=  normal_lpdf(y[i] | yHat[i], sigma / sqrt(weights[i]));
  }

I may well have misunderstood, but I thought the latter was what @bgoodri does in rstanarm when weights are included.

Thanks and best,

Mariel

#12

That is not what rstanarm does when weights are included. The stan_glm function does something like


Consequently, this makes no sense from a Bayesian perspective unless the original dataset has been collapsed to its unique rows except for a column of weights that counts the number of times that row appears in the original dataset. Otherwise, you are conditioning on something that you did not observe.

1 Like
#13

How does this

do inverse-variance weighting?

#14

It does not.

Instead, one has to calculate the standard deviation for the effect of each study outside Stan (see e.g. the compute.es R package, or equation 2 here) and provide it as data.

Then, the following model estimates the effect size such that the study weight depends on the studies effect sizes and associated standard deviations.

data {
  int N;            // number of studies
  vector[N] y;      // study effect sizes
  vector[N] sigmas; // standard deviations of study effect sizes
}
parameters {
  real mu;
}
model {
  mu ~ normal(0,2);
  y ~ normal(mu,sigmas);
}

Here, weighting is “implicit”, because larger studies have smaller variances, as you can see from the equation for the variance of Cohen’s d:

\sigma^2_d=\frac{n_a+n_b}{n_a*n_b}+\frac{d^2}{2(n_a+n_b)}
where n_a and n_b are sample sizes of the groups compared to obtain d. (details)

1 Like
#15

Not sure I follow. If I’ve “observed” the sampling, can I not include that information in my estimation?. Granted that you won’t have a posterior distribution, but rather a pseudo-posterior. The pseudo-posterior enjoys many desirable properties (reference) and can still be useful, I think.

1 Like
#16

Very helpful insights, @Guido_Biele, @maxbiostat, @bgoodri.

How could " the …inclusion probabilities [be used] to exponentiate the likelihood contribution of each sampled unit" in a stan program?

Thanks in advance.

#17

In log space that would be

 for(i in 1:N){
    target +=  normal_lpdf(y[i] | yHat[i], sigma) * weights[i];
  }

as pointed out above.

1 Like
#18

Thanks, @maxbiostat. That is very helpful.

#19

It isn’t Bayesian. The W_i people in the population that have the same background characteristics as the i-th person in the sample do not all have y_i as their outcome but multiplying the log-likelihood for the i-th person by W_i assumes so. I would rather post-stratify predictions from a real posterior distribution than work with something that isn’t a real posterior distribution.

1 Like
#23

Thanks for clarifying! My confusion about how weights are treated by rstanarm stems from the fact that ?stan_lm says that they’re treated the "same as lm".

I don’t think lm can be taking an approach analogous to normal_lpdf(y[i] | yHat[i], sigma) * weights[i] though, because lm weights are invariant to scale:

> y <- rnorm(100)
> w <- runif(100)
> fit_w   <- lm(y ~ 1, weights = w)
> fit_10w <- lm(y ~ 1, weights = 10*w)
> SE <- function(fit) summary(fit)$coef['(Intercept)', 'Std. Error'] 
> SE(fit_w) == SE(fit_10w)
[1] TRUE

Perhaps it would be clearest to say that weights in rstanarm are replication counts that treat each observation as one or more real observations (analogous to Stata’s fweights).

This would help clarify that scale very much does matter for rstanarm weights – the sum of the weights is equal to the number of observations in the original dataset.

1 Like
#24

The stan_glm and stan_lm functions do things a little differently. For most models, they are interpreted as frequency weights. For stan_lm, they can be like the weights for generalized least squares.

#25

@Guido_Biele. Thanks for this very helpful approach. I have two questions:

(1) How could we make the relationship between the weight and the variance explicit in the model below?

The reason is that we have two types of inverse variance weights:

weight_RandomEffects = 1/(tau^2 + sigma^2)
weight_FixedEffects = 1/sigma^2

(2) Could it be correct to implement directly in a stan program the relevant equations from here, here, here, or elsewhere by adding a transformed parameters block to the model below?

For example,

transformed parameters {
...
vector[N] tau;  
tau = sqrt(sum(w^2[i][(y[i] - y_mu)^2 - sigma^2[i]])/sum(w^2[i]))
...
}

Thanks in advance.

#26

I can look a bit more into this on Monday.
Generally, you can use weights in Stan as you can do it in when computing maximum likelihood estimates. (Though not everyone agrees one should)
The reason I’m am hesitant in my answer is, that I am unsure about what the sigma is in the inverse variance weights. I assume it’s the sigma of the effect sizes, and I mm addition one estimates an error variance, but I can’t tell without access to the paper. (I am also not sure what tau is).

As an aside, you can use brms to estimate meta analysis models. See for example here: https://vuorre.netlify.com/post/2017/01/19/better-forest-plots-from-meta-analytic-models-estimated-with-brms/

#27

Great, @Guido_Biele. Here, we have two sources of variability in the effect sizes of the primary studies:

tau^2 = between-studies variance
sigma^2 = within-study variance

tau can be calculated using the formula shown above in transformed parameters block.

sigma is calculated as follows:

These variances are then used to estimate the inverse variance weights:

Thanks in advance.

#28

Hi Bob,

In a 2017 post, you write

The closest the manual comes is a section on “Exploiting sufficient statistics”. I haven’t added anyting on weighted regression since our regression experts, Ben Goodrich and Andrew Gelman, don’t like the weightings (other than those based on sufficient stats) because the resulting model isn’t properly Bayesian in that there’s no generative process for the weights.

This makes very good sense to me and was just wondering if you could point to a specific paper where Goodrich and Gelman spell out these issues.

Thanks in advance,

David

#29

That would be Gelman (2007)


see also the discussion, rejoinder, and citations for that issue.

#30

Thanks!

David