Hi all, I’m relatively new to rstanarm and would like to know what the “weights” command is referring to. The data I tend to work with come with sampling weights because of the complex sampling designs of these surveys. Is that what is meant by “weights”? If, I presume I simply identify the column in the data that contain the weights and take it from there? Also, if I am correct, I am wondering also if there is any example code someone can make available.

@Guido_Biele, @lauren and @jonah are best equipped to answer your question. This has been (at least) partially answered in this forum before, so I suggest you have a search around if you haven’t done so already.

They are the numbers that can be multiplied by the likelihood, irrespective of whether that makes sense, and it mostly makes sense in the context of maximum likelihood rather than Bayesian estimation. For Bayesian estimation, I would only utilize them if your original dataset had rows that were non-unique and you collapsed it to the unique rows with a count of the number of times that row occurred in the original data.

Here is where I think one could/should make an exception to that rule:
In some circumstances, one can only obtain an unbiased estimate of a causal effect from observational data by using weights.* In addition, it is not possible to estimate theses weights jointly with the causal effect**.
In such situations i have used weights. The challenge is then to account for uncertainty in the weights. A “brute force method” is to use not only one set of weights, but a couple of hundred, do the weighted analysis as many times, and average over these analyses to get an estimate of the causal effects that also accounts for uncertainty in the weights.

* This are situations in which MRP does not reduce bias
** Meaning that it is not possible to estimate unbiased parameters of the full generative model that includes selection into the study and the causal effect of interest.

If we are talking about an unbiased (point) estimator across samples of size N that could have drawn from a population, then we are in the land of Frequentist estimation which is what almost all of the literature on sampling weights assumes. If you upweight / downweight the i-th contribution to the likelihood by a (known) weight w_i but did not actually observe w_i copies of the data for the i-th row, then the resulting analysis isn’t conditional on the observed data.

Just because this could be the root of misunderstandings: My response should specifically not address weights used to obtain unbiased estimation of opinion- or political surveys/polls, but weights used to obtain less biased estimates of causal effects. It is my understanding that these literatures are related but do address different questions.

I did not think of bias being connected to point estimators. I also do not remember discussions of selection bias in the context of (causal) effect estimation, which is what I often have to be concerned with, being tied to either a Bayesian or frequentist framework, but I may be missing something there. When I wrote about trying to obtain an (un- or better less biased) estimate for the a causal effect, I meant obtaining a posterior distribution for a causal effect.

I am rather pragmatic and think that there are two problems to solve: (a) What approach do can I use to reduce bias, this can involve choosing between MRP and weighting. (b) Estimating model parameters. I say that I am pragmatic because I treat these problems independently and did not spend much time on reflecting if this is philosophically sound.

Formulated like this, I’d say that the analyses I do might not be conditional on the data. But I am not sure if what you write accurately describes what I am doing. Specifically, I do not use an observed weight w_i, but I use the data to estimate a posterior distribution of w_i, which I then use in the next step of the analysis.

I think I am not grasping what definition of unbiasedness you are using here. The expectation of what conditional on what is equal to the true average causal effect?

In any event, rstanarm is not estimating a posterior distribution of the weights, so as far as the OP is concerned, I think the weights argument only makes sense for Frequentist estimation or when a dataset has been collapsed to just the unique rows with a count of how many times each row originally appeared.

This has been a really interesting thread, and it reminds me of when Gelman wrote that “Survey weighting is a mess”. Another approach, which we have been taking, has been to specifically incorporate the variables that make up the weights (the strata) into the analysis. I believe this is referred to as model-based inference and might be more satisfactory to Bayesians. I was simply trying to understand if the “weights” command in rstanarm actually allow one to read in the sampling weights if they are available, regardless if it is philosophically satisfactory from a Bayesian perspective.

My understanding is that many people around here suggest to use multilevel regression and post stratification (MRP, I think there is an rstanarm vignette from @lauren and other for that, I think).

So if one is analysing an opinion-poll type survey, I’d use MRP.

Further, my comment tried to point out that for some questions where ones is using observational data to estimate causal effects, there is no way around using weights, because MRP can’t do the trick.

The definition of bias I am using for estimates of causal effects is the difference between the causal effect in the population and the estimated effect in the study sample.

Yes, rstanarm does not produce a posterior distribution of weights. I was referring to the possibility to produce a posterior distribution of (propensity-score based) weights from the output of a Bayesian analysis.

If you mean the difference between the causal effect in the population and the expectation of the estimator across all possible samples that could be drawn from that population, that is fine and the use of weights might achieve that, but that’s a Frequentist concept.

What Ben says is correct - I wouldn’t use the weights argument in rstanarm for survey weights. Essentially they’re not going to give you something analogous to what the survey package would (and there’s no way of specifying the design).

That is not what I mean. I mean the difference between the estimate of a causal effect in the population, and the estimate of the causal effect from a particular sample at hand (which is not a random sample). Both estimates can be Bayesian posterior distributions, and the expectation plays only a role in that one would be more interested in differences between the means of posterior samples obtain from the population and the study-sample.

The reason I am resisting to see bias as a Frequentist concept is, that I see a Directed Acyclic Graph* that I uses to think about bias as a data-generating process that explains how differences between estimates from population and sample come about without that I need to involve explicitly frequentist concepts. But maybe I am missing something here.

Anyhow, I did not want to hijack this thread.

* Which I have an easier time to use than the potential outcomes framework …

Are you saying that if I have in my data the actual variables that were used in the calculation of the weights (e.g. strata) I shouldn’t use them as “fixed effects” in the analysis? Is there a paper out there somewhere that warns against this?

That’s not quite what I’m saying, what I’m saying is that things like the survey package will either take into account the survey design when running analysis or survey weights will be provided with replicate weights to provide correct error estimates. What Ben says about how weights can be used in Rstanarm is the only correct use.

In a large scale survey, the weights are used to count the number of “hypothetical” individuals that the observed individual represents in the population from which the individual was sampled. I am presuming that this is not a good reason to use the weight command in rstanarm, and it should only be used if there was an actual collapsing of rows. Ben… am I understanding you correctly? Thanks!

Yes. You might observe that there are N_j people in the population that are the same as the j-th person in the sample on the demographic variables used to construct N_j. But you did not observe that all N_j such people in the population have the same value of the outcome variable as the j person in the sample. To upweight the likelihood contribution of the j-th person in the sample by N_j is to assume that all N_j such people in the population have the same value of the outcome variable as the j person in the sample, which you did not observe and is undoubtedly false. Thus, using the weights argument in that way would be to condition on data (in the population) that you did not observe, which is antithetical to Bayesian inference.

Many other modelling assumptions are undoubtedly false, as in “approximately correct”. I honestly don’t think a sweeping statement like “assuming homogeneity is always wrong and unhelpful” is useful here.

I see where you are coming from, but survey weights are popular for a reason. It doesn’t matter that your inference procedure is amazing if you can’t fit your model to data. In many cases, a full Bayesian treatment is not feasible, and a hybrid solution such as a pseudo-posterior can be useful.

I don’t know the words to make a sufficiently sweeping statement. If you have a sample of like 1000 from a population of 210 million adults in the United States, then the average number of people in the population that a person in the sample represents is going to be about 210,000 and will be much higher for people who are hard to sample. To assume that hundreds of thousands of people in the population all have exactly the same value on the outcome variable (or another variable that did not go into the weights) as the person in the sample that represents them would be scientific malpractice.

But no one actually claims that the the N_j people in the population that are represented by the j-th person in the sample really are identical on all the variables that did not go into the weights. They just heard that you “must” use sampling weights in order to make the (point) estimator consistent (across all possible samples of that size that could have been drawn from the population) and presume that you also “must” use sampling weights in a Bayesian analysis.

Just returning to this for a moment. I fundamentally agree that this is a violation of the likelihood principle. I have not seen this argument in the literature. Do you have a citation.